非索引日期时间列的基于滚动时间的groupby（）平均值

df = pd.DataFrame({'game_t': [pd.datetime.now() - dt.timedelta(hours=n) for n in range(10)], 'player': [*'abacabaccb'], 'wl': ['w','l']*5, 'gid': [1,1,2,2,3,3,4,4,5,5]}) df.game_t = df.groupby('gid').game_t.transform('first') df # game_t player wl gid # 0 2019-07-05 15:00:23.840588 a w 1 # 1 2019-07-05 15:00:23.840588 b l 1 # 2 2019-07-05 13:00:23.840605 a w 2 # 3 2019-07-05 13:00:23.840605 c l 2 # 4 2019-07-05 11:00:23.840611 a w 3 # 5 2019-07-05 11:00:23.840611 b l 3 # 6 2019-07-05 09:00:23.840618 a w 4 # 7 2019-07-05 09:00:23.840618 c l 4 # 8 2019-07-05 07:00:23.840623 c w 5 # 9 2019-07-05 07:00:23.840623 b l 5

# gt player wl gid bta # 0 2019-07-05 15:00:23.840588 a w 1 True # 1 2019-07-05 15:00:23.840588 b l 1 False # 2 2019-07-05 13:00:23.840605 a w 2 True # 3 2019-07-05 13:00:23.840605 c l 2 False # 4 2019-07-05 11:00:23.840611 a w 3 True # 5 2019-07-05 11:00:23.840611 b l 3 False # 6 2019-07-05 09:00:23.840618 a w 4 False # 7 2019-07-05 09:00:23.840618 c l 4 True # 8 2019-07-05 07:00:23.840623 c w 5 False # 9 2019-07-05 07:00:23.840623 b l 5 False

1条回答

网友

1楼 · 发布于 2024-05-29 05:01:00

我不认为你的任何问题是个问题：

game_t不是索引：将其设置为索引
game_t不是单调的：排序它

这是我的解决办法

# sort values and set index
df = df.sort_values('game_t').set_index('game_t')

# if the player wins -> for rolling
df['is_win'] = df.wl.eq('w')

# closed='left' option skip the current game
win_mean = (df.groupby('player')
              .is_win.rolling('4.5H', closed='left')
              .mean().reset_index()
           )

df = df.reset_index().merge(win_mean, on = ['game_t', 'player'])
df['bta'] = df.is_win_y.gt(0.5)

df.sort_values(['gid', 'wl'], ascending=[True, False])

提供：

                      game_t player wl  gid  is_win_x  is_win_y    bta
8 2019-07-05 15:00:23.840588      a  w    1      True       1.0   True
9 2019-07-05 15:00:23.840588      b  l    1     False       0.0  False
6 2019-07-05 13:00:23.840605      a  w    2      True       1.0   True
7 2019-07-05 13:00:23.840605      c  l    2     False       0.0  False
4 2019-07-05 11:00:23.840611      a  w    3      True       1.0   True
5 2019-07-05 11:00:23.840611      b  l    3     False       0.0  False
2 2019-07-05 09:00:23.840618      a  w    4      True       NaN  False
3 2019-07-05 09:00:23.840618      c  l    4     False       1.0   True
0 2019-07-05 07:00:23.840623      c  w    5      True       NaN  False
1 2019-07-05 07:00:23.840623      b  l    5     False       NaN  False

如果愿意，可以删除这两列is_win。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章