在python/pandas中删除子字符串并合并行

2024-05-15 03:45:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我的df:

   description               total      average      number
0 NFL football (white) L     49693        66       1007
1 NFL football (white) XL    79682        74       1198
2 NFL football (white) XS    84943        81       3792
3 NFL football (white) S     78371        73       3974
4 NFL football (blue) L      99482        92       3978
5 NFL football (blue) M      32192        51       3135
6 NFL football (blue XL      75343        71       2879
7 NFL football (red) XXL     84391        79       1192
8 NFL football (red) XS      34727        57       992
9 NFL football (red) L       44993        63       1562

我想做的是删除尺寸,留下每种颜色足球的总和、平均值和总和:

   description               total      average    number
0 NFL football (white)       292689       74       9971
1 NFL football (blue)        207017       71       9992
2 NFL football (red)         164111       66       3746

非常感谢您的任何建议


Tags: numberdf尺寸bluereddescriptiontotalaverage
2条回答

替换works,但也可以使用rsplit删除描述中的最后一个单词,然后执行groupby:

df.description = df.description.apply(lambda x: x.rsplit(' ',1)[0])

df.groupby(by='description')[['total', 'number']].sum() 

您可以groupby重新格式化的description字段(无需修改description的原始内容),在该字段中,重新格式化是通过使用空格分割完成的,并通过使用.str.split().str.join()排除最后一部分。然后用.agg()进行聚合

通过使用.round().astype()四舍五入并转换为整数,进一步将输出重新格式化为所需的输出

(df.groupby(
            df['description'].str.split(' ').str[:-1].str.join(' ')
           )
   .agg({'total': 'sum', 'average': 'mean', 'number': 'sum'})
   .round(0)
   .astype(int)
).reset_index()

结果:

            description   total  average  number
0   NFL football (blue)  207017       71    9992
1    NFL football (red)  164111       66    3746
2  NFL football (white)  292689       74    9971

相关问题 更多 >

    热门问题