给数据帧添加一个列，该列是来自另一个数据帧的条件求和。

def get_team_salary(year, team): data_slice = salary_data_df[(salary_data_df.yearID == year) & (salary_data_df.teamID == team)] return data_slice['salary'].sum() #This line of code works correctly without the next function in the code. #team_data_df['team_salary'] = get_team_salary(2000,'ANA') def assign_team_salaries(team_data_df): year = team_data_df['yearID'] team = team_data_df['teamID'] return team_data_df.applymap(get_team_salary(year, team)) team_data_df['team_salary'] = assign_team_salaries(team_data_df)

1条回答

网友

1楼 · 发布于 2024-05-12 13:43:56

如您所述，您可以在salary_data_df上使用.groupby，然后将这些和合并到team_data_df。你知道吗

举以下两个小例子：

print(team_data_df)
  teamID  yearID
0      a    2000
1      b    2000
2      c    2000
3      a    2001
4      b    2001
5      c    2001

print(salary_data_df)
   teamID  yearID  playerID  salary
0       a    2000         1     100
1       a    2000         2     200
2       b    2000         4     300
3       b    2000         5     400
4       b    2000         6     500
5       c    2000         7     600
6       a    2001         1     700
7       a    2001         2     800
8       a    2001         3     900
9       b    2001         4    1000
10      b    2001         5    1100
11      c    2001         7    1200
12      c    2001         8    1300

然后：

sums = (salary_data_df
        .groupby(by=['yearID', 'teamID'])
        .sum()['salary']
        .reset_index())
    # alternative: use parameter `as_index=True` instead of `.reset_index()`

res = team_data_df.merge(sums, on=['yearID', 'teamID'])

print(res)
  teamID  yearID  salary
0      a    2000     300
1      b    2000    1200
2      c    2000     600
3      a    2001    2400
4      b    2001    2100
5      c    2001    2500

您可能还需要注意merge的on参数。它们模仿类似SQL的合并规范。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章