Pandas数据透视表值错误：索引包含重复项，无法重塑

Sample_Name Sample_ID Sample_Type IS Component_Name IS_Name Component_Group_Name Outlier_Reasons Actual_Concentration Area Height Retention_Time Width_at_50_pct Used Calculated_Concentration Accuracy Index 1 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown True GluCer(d18:1/12:0)_LCB_264.3 NaN NaN NaN 0.1 2.733532e+06 5.963840e+05 2.963911 0.068676 True NaN NaN 2 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown True GluCer(d18:1/17:0)_LCB_264.3 NaN NaN NaN 0.1 2.945190e+06 5.597470e+05 2.745026 0.068086 True NaN NaN 3 20170824_ELN147926_HexLacCer_Plasma_A-1-1 NaN Unknown False GluCer(d18:1/16:0)_LCB_264.3 GluCer(d18:1/17:0)_LCB_264.3 NaN NaN NaN 3.993535e+06 8.912731e+05 2.791991 0.059864 True 125.927659773487 NaN

2条回答

网友

1楼 · 编辑于 2024-05-23 19:49:20

您可以使用groupby()和unstack()来避免使用pivot()看到的错误。在

以下是一些示例数据，其中添加了一些边缘大小写，删除了一些列值或用MCVE替换：

# df
      Sample_Name  Sample_ID     IS Component_Name Calculated_Concentration Outlier_Reasons
Index                                                                    
1             foo        NaN   True              x                  NaN              NaN  
1             foo        NaN   True              y                  NaN              NaN 
2             foo        NaN   False             z            125.92766              NaN 
2             bar        NaN   False             x                 1.00              NaN  
2             bar        NaN   False             y                 2.00              NaN  
2             bar        NaN   False             z                  NaN              NaN  

(df.groupby(['Sample_Name','Component_Name'])
   .Calculated_Concentration
   .first()
   .unstack()
)

输出：

^{pr2}$

网友

2楼 · 编辑于 2024-05-23 19:49:20

您应该能够通过使用文档中的pandas.pivot_table()功能来完成您想要做的事情here。在

将数据帧存储为df时，请使用以下代码：

import pandas as pd
df = pd.read_table('table_from_which_to_read')

new_df = pd.pivot_table(df,index=['Simple Name'], columns = 'Component_Name', values = "Calculated_Concentration")

如果您想要的不是浓度值的平均值，则需要更改aggfunc参数。在

编辑

因为您不想聚集这些值，所以您可以通过在DataFrame上使用set_index函数来重塑数据，并找到文档here。在

^{pr2}$

结果表应该看起来像您期望的结果，并且将有一个多索引。在

相关问题更多 >

编程相关推荐

热门问题

热门文章