python pandas合并groupby

>>> merged_df.dtypes Jurisdiction object AdjustedVolume float64 EffectiveStartDate datetime64[ns] VintageYear int64 ProductType object Rate float32 Obligation float32 Demand float64 Cost float64 dtype: object

>>> merged_df.groupby(['Jurisdiction', 'VintageYear'])['AdjustedVolume'].sum() Jurisdiction VintageYear AdjustedVolume CA 2017 3.529964e+05 >>> merged_df.groupby(['Jurisdiction', 'VintageYear','ProductType'])['AdjustedVolume'].sum() Jurisdiction VintageYear ProductType AdjustedVolume CA 2017 Bucket1 7.584832e+04 CA 2017 Bucket2 1.308454e+05 CA 2017 Bucket3 1.463026e+05

>>> df1.dtypes Jurisdiction object ProductType object VintageYear int64 EffectiveStartDate datetime64[ns] Rate float32 Obligation float32 dtype: object >>> df2.dtypes Jurisdiction object AdjustedVolume float64 EffectiveStartDate datetime64[ns] VintageYear int64 dtype: object

2条回答

网友

1楼 · 编辑于 2024-04-19 06:08:51

还可以考虑使用transform检索与其他记录内联的分组聚合，类似于SQL中的子查询聚合。在

grpdf = merged_df.groupby(['Jurisdiction', 'VintageYear','ProductType'])['AdjustedVolume']\
                 .sum().reset_index()

grpdf['TotalAdjVolume'] = merged_df.groupby(['Jurisdiction', 'ProductType'])['AdjustedVolume']\
                                   .transform('sum')

网友

2楼 · 编辑于 2024-04-19 06:08:51

您可以使用groupby的两个版本并合并两个表。第一个表是一个带有ProductType的groupby，它将按ProductType分解调整后的卷。在

df = df.groupby(['Jurisdiction','VintageYear','ProductType']).agg({'AdjustedVolume':'sum'}).reset_index(drop = False)

然后创建另一个表，不包含ProductType（这是总金额的来源）。在

^{pr2}$

现在在两个表中创建一个ID列，以便合并能够正常工作。在

df['ID'] = df['Jurisdiction'].astype(str)+'_' +df['VintageYear'].astype(str)
df1['ID'] = df1['Jurisdiction'].astype(str)+'_'+ df1['VintageYear'].astype(str)

现在在IDs上合并得到调整后的总体积。在

df = pd.merge(df, df1, left_on = ['ID'], right_on = ['ID'], how = 'inner')

最后一步是清理列。在

df = df.rename(columns = {'AdjustedVolume_x':'AdjustedVolume',
                          'AdjustedVolume_y':'TotalAdjustedVolume',
                          'Jurisdiction_x':'Jurisdiction',
                          'VintageYear_x':'VintageYear'})
del df['Jurisdiction_y']
del df['VintageYear_y']

您的输出将如下所示：

相关问题更多 >

编程相关推荐

热门问题

热门文章