Pandas:计算与透视表或交叉表的重叠

2024-06-16 10:17:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图与数据帧中的一些数据进行重叠。 下面是一个简单的例子:

df=pd.DataFrame({
'player':['A', 'B', 'C', 'D', 'A', 'C', 'B'], 
'game':['gameA', 'gameB', 'gameC', 'gameC', 'gameB', 'gameD', 'gameA']})

测向:

    game player
0  gameA      A
1  gameB      B
2  gameC      C
3  gameC      D
4  gameB      A
5  gameD      C
6  gameA      B

我想做的是计算两个游戏中每个组合的玩家数。你知道吗

例如,结果应如下所示:

   game1 game2   overlap
  gameA  gameB        2 #Because there is 2 players who play at gameA and gameB
  gameA  gameC        0
  gameA  gameD        0
  gameB  gameA        2         
  gameB  gameC        0
  gameB  gameD        0          
  ...

我可以用dictionary和foreach来实现这一点,但是有没有简单的方法用pivot表或交叉表来实现呢?你知道吗

非常感谢。你知道吗


Tags: 数据game游戏dataframedf玩家例子pd
1条回答
网友
1楼 · 发布于 2024-06-16 10:17:03

您可以使用pd.merge来创建game_table

game_table = pd.merge(df, df, how='left', on=['player'])
#    game_x player game_y
# 0   gameA      A  gameA
# 1   gameA      A  gameB
# 2   gameB      B  gameB
# 3   gameB      B  gameA
# 4   gameC      C  gameC
# 5   gameC      C  gameD
# 6   gameC      D  gameC
# 7   gameB      A  gameA
# 8   gameB      A  gameB
# 9   gameD      C  gameC
# 10  gameD      C  gameD
# 11  gameA      B  gameB
# 12  gameA      B  gameA

然后将pd.crosstab应用于game_table

freq = pd.crosstab(game_table['game_x'], game_table['game_y'])
# game_y  gameA  gameB  gameC  gameD
# game_x                            
# gameA       2      2      0      0
# gameB       2      2      0      0
# gameC       0      0      2      1
# gameD       0      0      1      1

stack后跟reset_index将数据帧重塑为所需的形式:

result = freq.stack().reset_index()

import pandas as pd
df = pd.DataFrame(
    {'player':['A', 'B', 'C', 'D', 'A', 'C', 'B'], 
     'game':['gameA', 'gameB', 'gameC', 'gameC', 'gameB', 'gameD', 'gameA']})

game_table = pd.merge(df, df, how='left', on=['player'])
freq = pd.crosstab(game_table['game_x'], game_table['game_y'])
result = freq.stack()
result.name = 'overlap'
result = result.reset_index()
mask = (result['game_x'] != result['game_y'])
result = result.loc[mask]
print(result)

收益率

   game_x game_y  overlap
1   gameA  gameB        2  # Because both A and B played in gameA and gameB
2   gameA  gameC        0
3   gameA  gameD        0
4   gameB  gameA        2
6   gameB  gameC        0
7   gameB  gameD        0
8   gameC  gameA        0
9   gameC  gameB        0
11  gameC  gameD        1
12  gameD  gameA        0
13  gameD  gameB        0
14  gameD  gameC        1

相关问题 更多 >