如何在Python中聚合某些列而保留其他列

2024-04-26 07:31:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含>100 variables的数据集,但为了说明这个问题,我将把它简化如下。你知道吗

enter image description here

我想groupby()colAcolBsum()colD,同时取colCcolE的不同值

我尝试了下面的方法,但是你知道这只会返回我分组的列和我求和的列,并且从不返回colCcolE

方法1:df.groupby(['colA','colB').aggregate({'colC': sum})

方法2:我可以像上面那样聚合它们,然后稍后将其加入同一个表以获得结果。你知道吗

方法3:在groupby中有所有列,但是当我这样做时,我不能按Missing的值分组,就像您在colE中看到的那样

我有什么选择?你知道吗

更正更新:我刚刚纠正了我之前提交数据的方式,这是不正确的

data = [
    ["25-5-19", "cat1", "cat3", 10, 1],
    ["25-5-19", "cat1", "cat3", 20, 1],
    ["25-5-19", "cat1", "cat3", 30, 1],
    ["26-5-19", "cat2", "cat4", 50, 2],
    ["26-5-19", "cat2", "cat4", 100, 2],
    ["26-5-19", "cat2", "cat4", 10, 2],
    ["27-5-19", "cat1", "cat5", 40, None],
    ["27-5-19", "cat1", "cat5", 60, None]
] 

`


Tags: 数据方法nonevariablessumgroupbycat1cat2
3条回答

看来你需要

df['New']=df.groupby(['colA','colB'])['colC'].transform('sum')
df = pd.DataFrame(data, columns = ['colA', 'colB', 'colC', 'colD', 'colE']) 

df['colE'] = df['colE'].fillna(-1)# I replaced all NaN with -1 to avoid `function not reduce error`

df.groupby(['colA','colB']).aggregate({'colD':sum,'colC':np.unique,'colE':np.unique})

I want to groupby() colA,colB and sum() colD, while taking the distinct values of colC and colE

所以我相信你可以在聚合器中使用set,完成后使用reset_index()

# dataframe data from example (+ extra `cat1` in ColC)
data = [
    ["25-5-19", "cat1", "cat1", 10, 1],
    ["25-5-19", "cat1", "cat3", 20, 1],
    ["25-5-19", "cat1", "cat3", 30, None],
    ["26-5-19", "cat2", "cat4", 50, 2],
    ["26-5-19", "cat2", "cat4", 100, 2],
    ["26-5-19", "cat2", "cat4", 10, 2]
] 

df = pd.DataFrame(data, columns = ['colA', 'colB', 'colC', 'colD', 'colE']) 

# aggregator sums over `colD` and gets distinct values of `colC` and `colE`
df.groupby(['colA', 'colB']).aggregate({'colD': sum, 'colC': set, 'colE': set}).reset_index()


| - | colA  |colB   | colE         | colD   | colC          |
| -|   -|   -|       |    |       -|
|0  |25-5-19|  cat1 |   {nan, 1.0} |  60    |  {cat3, cat1} |
|1  |26-5-19|  cat2 |   {2.0}      |  160   |   {cat4}      |

相关问题 更多 >