Python将一列值拆分为多列并保持其他列不变

1 投票
3 回答
67 浏览
提问于 2025-04-13 18:57

我有一些数据,如下所示。在这些数据中,我想把A列的值拆分成不同的列,并且要把对应的B列的值放在一起,同时C列要包含对应的组的值。请注意,这只是一个示例,实际上还有多个这样的组和它们对应的值。

输入数据

Column1 Column2 Column3
Group1  Value1  V1
Group1  Value2  V2
Group1  Value3  V3
Group1  Value4  V4
Group2  Value1  x1
Group2  Value2  x2
Group2  Value3  x3
Group2  Value4  x4
Group3  Value1  y1
Group3  Value2  y2

期望的输出结果:

Group1 Group2 Group3 Column3.Group1 Column3.Group2 Column3.Group3
Value1 Value1 Value1  v1             x1                y1
Value2 Value2 Value2  v2             x2                y1
Value3 Value3 NaN     v3             x3                NaN
Value4 Value4 NaN     v4             x4                NaN

有没有办法在Python中实现这个?我在论坛上搜索过,但没有找到……我对Python还很陌生,所以如果能有解释和解决方案就太好了。谢谢!

3 个回答

0

使用 GroupBy.cumcountDataFrame.pivot,最后在列表推导中扁平化 MultiIndex

out = (df.assign(g=df.groupby('Column1').cumcount())
         .pivot(index='g', columns='Column1')
         .rename_axis(None))
out.columns = [b if a == 'Column2' else f'{a}.{b}' for a, b in out.columns]
print (out)
   Group1  Group2  Group3 Column3.Group1 Column3.Group2 Column3.Group3
0  Value1  Value1  Value1             V1             x1             y1
1  Value2  Value2  Value2             V2             x2             y2
2  Value3  Value3     NaN             V3             x3            NaN
3  Value4  Value4     NaN             V4             x4            NaN

如果原始的数据表有更多的列,但只需要处理 Column1, Column2, Column3 这几列,可以添加参数 value

out = (df.assign(g=df.groupby('Column1').cumcount())
         .pivot(index='g', columns='Column1', values=['Column2','Column3'])
         .rename_axis(None))
out.columns = [b if a == 'Column2' else f'{a}.{b}' for a, b in out.columns]
print (out)
   Group1  Group2  Group3 Column3.Group1 Column3.Group2 Column3.Group3
0  Value1  Value1  Value1             V1             x1             y1
1  Value2  Value2  Value2             V2             x2             y2
2  Value3  Value3     NaN             V3             x3            NaN
3  Value4  Value4     NaN             V4             x4            NaN
0

这是一个经典的透视表操作,通常我们会去掉重复的列,但在这里我们是去掉重复的行。

为了做到这一点,我们需要先用 pivot 方法进行透视,前提是先用 cumcount 方法去掉重复的列:

out = (df
   .assign(index=df.groupby('Column1').cumcount())
   .pivot(index='index', columns='Column1')
   .rename_axis(index=None)
)

out.columns = out.columns.map('.'.join)

输出结果:

  Column2.Group1 Column2.Group2 Column2.Group3 Column3.Group1 Column3.Group2 Column3.Group3
0         Value1         Value1         Value1             V1             x1             y1
1         Value2         Value2         Value2             V2             x2             y2
2         Value3         Value3            NaN             V3             x3            NaN
3         Value4         Value4            NaN             V4             x4            NaN

对于你最开始问题中的原始格式:

out = (df
   .assign(index=df.groupby('Column1').cumcount())
   .pivot(index='index', columns='Column1').droplevel(0, axis=1)
   .rename_axis(index=None, columns=None)
)

输出结果:

   Group1  Group2  Group3 Group1 Group2 Group3
0  Value1  Value1  Value1     V1     x1     y1
1  Value2  Value2  Value2     V2     x2     y2
2  Value3  Value3     NaN     V3     x3    NaN
3  Value4  Value4     NaN     V4     x4    NaN

我个人建议保留多重索引,这样可以让层级关系更加清晰:

out = (df
   .assign(index=df.groupby('Column1').cumcount())
   .pivot(index='index', columns='Column1')
   .rename_axis(index=None, columns=[None, None])
)

输出结果:

  Column2                 Column3              
   Group1  Group2  Group3  Group1 Group2 Group3
0  Value1  Value1  Value1      V1     x1     y1
1  Value2  Value2  Value2      V2     x2     y2
2  Value3  Value3     NaN      V3     x3    NaN
3  Value4  Value4     NaN      V4     x4    NaN
0
tmp = df.assign(cc=df.groupby('Column1').cumcount())
out = pd.concat(
    [tmp.pivot(index='cc', columns='Column1', values='Column2'), 
     tmp.pivot(index='cc', columns='Column1', values='Column3').add_prefix('Column3.')
    ], axis=1).rename_axis(index=None, columns=None)

输出:

   Group1  Group2  Group3 Column3.Group1 Column3.Group2 Column3.Group3
0  Value1  Value1  Value1             V1             x1             y1
1  Value2  Value2  Value2             V2             x2             y2
2  Value3  Value3     NaN             V3             x3            NaN
3  Value4  Value4     NaN             V4             x4            NaN

撰写回答