Python将数据帧中的行连接到相同的值上,并对字符串值进行聚合

2024-05-13 20:45:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个足球运动员的数据框,年复一年的记录如下:

    df
        player    position    team        stat2015    stat2016    stat2017    stat2018
    0   messi      Wing       Barca       9.85        nan         nan         nan
    1   messi      nan        Barca       nan         5.43        nan         nan
    2   messi      nan        Barca       nan         nan         3.56        nan
    3   dybala     Att        Palermo     15.85       nan         nan         nan
    4   messi      Att        Barca       nan         nan         nan         8.45
    5   dybala     Wing       Juve        nan         7.89        nan         nan
    6   higuain    Att        Napoli      13.22       nan         nan         nan
    7   dybala     Mid        Juve        nan         nan         13.89       nan
    8   higuain    nan        Juve        nan         11.33       nan         nan
    9   higuain    Att        Milan       nan         nan         nan         7.61
    10  ...        ...        ...         ...         ...         ...         ...

我目前正在做的是尝试加入同一名球员的行,用正确年份的统计数据填充nan值,并记录球员所踢位置和球队的历史。输出应如下所示:

    out_df
        player    position        team                    stat2015    stat2016    stat2017    stat2018
    0   messi     [Att,Wing]      Barca                   9.85        5.43        3.56        8.45
    1   dybala    [Att,Wing,Mid]  [Palermo,Juve]          15.85       7.89        13.89       0.0
    2   higuain   Att             [Napoli, Juve, Milan]   13.22       11.33       0.0         7.61
    3   ...       ...             ...                     ...         ...         ...         ...

我从现在开始做的是这个,但它似乎并不像我想的那样工作

    out_df = pd.DataFrame(columns = list(df.columns))
    for player in set(df.player):
        temp = df[df.apply(lambda row: row.astype(str).str.contains(player).any(), axis=1)]
        temp = temp.groupby('player').sum().reset_index()
        out_df = out_df.append(temp, sort = False, ignore_index=True)

有人能帮我吗


Tags: df记录positionnanouttempteamatt
1条回答
网友
1楼 · 发布于 2024-05-13 20:45:34

您可以使用^{}^{}根据预期的输出以不同的方式聚合组:

# dict to aggregate with first over stats columns
d = {col:'first' for col in df.filter(like='stat').columns}
# {'stat2015': 'first', 'stat2016': 'first', 'stat2017': 'first', 'stat2018': 'first'}
first_val = lambda x: list(set(x.dropna()))
(df.groupby('player').agg({'position': first_val,
                          'team': first_val,
                          **d}).fillna(0))

             position                team          stat2015  stat2016  \
player                                                                 
dybala   [Wing, Mid, Att]        [Juve, Palermo]     15.85      7.89   
higuain             [Att]  [Juve, Napoli, Milan]     13.22     11.33   
messi         [Wing, Att]                [Barca]      9.85      5.43   

         stat2017  stat2018  
player                       
dybala      13.89      0.00  
higuain      0.00      7.61  
messi        3.56      8.45 

相关问题 更多 >