如何对CSV文件中的多个列进行分组和求和？

date id name unitCount orderCount invoiceCount 2020-02-12 1 Guitar 200 100 200 2020-02-12 2 Drums 300 200 100 2020-02-12 3 Piano 400 700 300 2020-02-11 1 Guitar 100 500 300 2020-02-11 2 Drums 200 400 400 2020-02-11 3 Piano 300 300 100

date id name total_unitCount total_orderCount total_invoiceCount 2020-02-12 1 Guitar 300 600 500 2020-02-12 2 Drums 500 600 500 2020-02-12 3 Piano 700 1000 400

3条回答

网友

1楼 · 编辑于 2024-05-14 15:00:10

只需在groupby对象上调用sum()，然后相应地重命名列名，最后将生成的数据帧写入csv文件

下面应该可以做到这一点：

df = pd.read_csv(r'path/to/myfile.csv', sep=';')

df.groupby(['id', 'name'])['unitCount', 'orderCount', 'invoiceCount'] \
  .sum() \
  .rename(columns={'unitCount':'total_unitCount', 'orderCount' : 'total_orderCount', 'invoiceCount': 'total_invoiceCount'}) \
  .to_csv('path/to/myoutputfile_sum.csv', sep=';')

网友

2楼 · 编辑于 2024-05-14 15:00:10

您可以执行以下操作

# group rows by 'id' column
df.groupby('id', as_index=False).agg({'date':'max',
                                      'name':'first',
                                      'unitCount':'sum',
                                      'orderCount':'sum',
                                      'invoiceCount':'sum'}

# change the order of the columns
df = df[['date', 'id', 'name', 'unitCount', 'orderCount'  ,'invoiceCount']]

# set the new column names
df.columns=['date', 'id', 'name', 'total_unitCount', 'total_orderCount'  ,'total_invoiceCount']

# save the dataframe as .csv file
df.to_csv('path/to/myfile_sum.csv')

网友

3楼 · 编辑于 2024-05-14 15:00:10

您可以使用一些手动agg：

(df.groupby('id', as_index=False)
   .agg({'date':'max', 'name':'first',
         'unitCount':'sum',
         'orderCount':'sum',
         'invoiceCount':'sum'})
   .to_csv('file.csv')
)

相关问题更多 >

编程相关推荐

热门问题

热门文章