当一列的条目成为新标签时,按groupby/pivot显示数据

2024-06-07 15:52:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用python+pandas(previous question)总结一下发电厂的发电能力。你知道吗

对于此任务,数据必须分组/透视,而“技术”列中的列条目应成为列标签

这是我的意见:

Plant Name,Nameplate Capacity,Technology,...
Barry,153.1,Natural Gas Steam Turbine,..
Barry,153.1,Natural Gas Steam Turbine,..
Barry,403.7,Conventional Steam Coal,..
Barry,788.8,Conventional Steam Coal,..
Barry,195.2,Natural Gas Fired Combined Cycle,..
Barry,195.2,Natural Gas Fired Combined Cycle,..

以及所需的输出:

Plant Name,Natural Gas Steam Turbine,Conventional Steam Coal,Natural Gas Fired Combined Cycle,..
Barry,306.2,1192.5,390.4,..

我试过几个命令,但都没有成功:

df.groupby(['Plant Name', 'Technology']).sum().pivot('Plant Name', 'Technology').fillna(0)

或者

#with numpy as np
res = df.pivot_table(index=["Plant Name"], columns=["Plant Name"], values=["Technology"], aggfunc=np.sum)

另一个问题

如何找到每一行作为新列的最大条目(例如我的示例中的“常规动力煤”?你知道吗


Tags: namedf条目naturalsteamgasplantcombined
2条回答

我认为需要更改列名并添加参数fill_value

res = df.pivot_table(index="Plant Name", 
                     columns="Technology", 
                     values="Nameplate Capacity", 
                     aggfunc=np.sum,
                     fill_value=0).reset_index()
print (res)
Technology Plant Name  Conventional Steam Coal  \
0               Barry                   1192.5   

Technology  Natural Gas Fired Combined Cycle  Natural Gas Steam Turbine  
0                                      390.4                      306.2  

第一个解决方案应该用指定列来更改aggreatesum^{}来更改reformate:

res = (df.groupby(['Plant Name', 'Technology'])['Nameplate Capacity']
         .sum()
         .unstack(fill_value=0)
         .reset_index())
print (res)
Technology Plant Name  Conventional Steam Coal  \
0               Barry                   1192.5   

Technology  Natural Gas Fired Combined Cycle  Natural Gas Steam Turbine  
0                                      390.4                      306.2  

参数未对齐到pd.pivot_table。列表示类别标签,而值表示要聚合的数据。你知道吗

此外,您应该使用'sum'而不是np.sum,因为Pandas经过优化,可以在给定字符串输入的情况下使用适当的alogrithms:

res = df.pivot_table(index='Plant Name', columns='Technology',
                     values='Nameplate Capacity', aggfunc='sum')

print(res)

Technology  Conventional Steam Coal  Natural Gas Fired Combined Cycle  \
Plant Name                                                              
Barry                        1192.5                             390.4   

Technology  Natural Gas Steam Turbine  
Plant Name                             
Barry                           306.2  

相关问题 更多 >

    热门问题