如何将数据框从长格式转换为宽格式并带有“分组”列？

0 投票

3 回答

77 浏览

提问于 2025-04-12 17:09

在将下面这个数据表从长格式转换为宽格式时，我想把一些列分成“组”，并给它们加上前缀或后缀。

这些元素的组可以有不同的大小，也就是说，可能由一个、两个或更多的元素/行组成。为了让例子简单，我这里用的是两个一组。

import pandas as pd

df = pd.DataFrame(
    [
        {'group': 'group-009297534',  'single_id': 'single-011900051',  'country': 'ESP',  'name': '00000911'},
        {'group': 'group-009297534',  'single_id': 'single-000000821',  'country': 'USA',  'name': '00001054'},
        {'group': 'group-009280053',  'single_id': 'single-000000002',  'country': 'HUN',  'name': '00000496'},
        {'group': 'group-009280053',  'single_id': 'single-000000014',  'country': 'HUN',  'name': '00000795'},
        {'group': 'group-009245039',  'single_id': 'single-000001258',  'country': 'NOR',  'name': '00000527'},
        {'group': 'group-009245039',  'single_id': 'single-000000669',  'country': 'TWN',  'name': '00000535'}
    ]
)

我给要分组的元素分配一个索引，然后用这个索引来命名列，这个方法已经朝着正确的方向走了，但结果还是和我预期的有些偏差。

df['idx'] = df.groupby('group').cumcount()
df.pivot(index='group', columns='idx')

group	('single_id', 0)	('single_id', 1)	('country', 0)	('country', 1)	('name', 0)	('name', 1)
group-009245039	single-000001258	single-000000669	NOR	TWN	00000527	00000535
group-009280053	single-000000002	single-000000014	HUN	HUN	00000496	00000795
group-009297534	single-011900051	single-000000821	ESP	USA	00000911	00001054

不过，我预期的结果应该是这样的：

	group	single_id_1	country_1	name_1	single_id_2	country_2	name_2
0	group-009245039	single-000001258	NOR	00000527	single-000000669	TWN	00000535
1	group-009280053	single-000000002	HUN	00000496	single-000000014	HUN	00000795
2	group-009297534	single-011900051	ESP	00000911	single-000000821	USA	00001054

我不太确定用多重索引的方法是否合适，因为这可能需要进行排序和合并，或者是否有更简单的办法。

数据处理多重索引分组列宽格式数据重塑数据框转换长格式列命名

3 个回答

你可以使用来自janitor的 pivot_wider 函数。

import janitor
df['cols'] = df.groupby('group').cumcount() + 1
df.pivot_wider(index = 'group', names_from = 'cols')

            group      single_id_1      single_id_2 country_1 country_2 name_1 name_2
0 group-009245039 single-000001258 single-000000669       NOR       TWN    527    535
1 group-009280053 single-000000002 single-000000014       HUN       HUN    496    795
2 group-009297534 single-011900051 single-000000821       ESP       USA    911   1054

同样，你也可以使用来自siuba的 pivot_wider 函数：

from siuba import _
from siuba.experimental.pivot import pivot_wider

df['cols'] = df.groupby('group').cumcount() + 1
pivot_wider(df, names_from = _.cols, 
                values_from = _['single_id', 'country', 'name'],
                names_vary = 'slowest')

            group      single_id_1 country_1 name_1      single_id_2 country_2 name_2
0 group-009245039 single-000001258       NOR    527 single-000000669       TWN    535
1 group-009280053 single-000000002       HUN    496 single-000000014       HUN    795
2 group-009297534 single-011900051       ESP    911 single-000000821       USA   1054

回答于 2025-04-12 由 Python大师

分享举报

这就是你想要的东西吗？

import pandas as pd

df = pd.DataFrame([
    {'group': 'group-009297534',  'single_id': 'single-011900051',  'country': 'ESP',  'name': '00000911'},
    {'group': 'group-009297534',  'single_id': 'single-000000821',  'country': 'USA',  'name': '00001054'},
    {'group': 'group-009280053',  'single_id': 'single-000000002',  'country': 'HUN',  'name': '00000496'},
    {'group': 'group-009280053',  'single_id': 'single-000000014',  'country': 'HUN',  'name': '00000795'},
    {'group': 'group-009245039',  'single_id': 'single-000001258',  'country': 'NOR',  'name': '00000527'},
    {'group': 'group-009245039',  'single_id': 'single-000000669',  'country': 'TWN',  'name': '00000535'}
])

df['idx'] = (df.groupby('group').cumcount() + 1).astype(str)


# Pivot the DataFrame with the new 'idx' for differentiation
df_pivoted = df.pivot(index='group', columns='idx')

# Flatten the MultiIndex and format column names
df_pivoted.columns = [f'{x[0]}_{x[1]}' for x in df_pivoted.columns]

# Reset the index to bring 'group' back as a column
df_pivoted.reset_index(inplace=True)

# Optional: Reorder the columns according to your expected output
# This assumes you know the order and number of groups
expected_order = [
    'group', 
    'single_id_1', 'country_1', 'name_1',
    'single_id_2', 'country_2', 'name_2'
]
df_pivoted = df_pivoted[expected_order]

print(df_pivoted)

输出结果：

  group       single_id_1 country_1    name_1       single_id_2  \
0  group-009245039  single-000001258       NOR  00000527  single-000000669   
1  group-009280053  single-000000002       HUN  00000496  single-000000014   
2  group-009297534  single-011900051       ESP  00000911  single-000000821   

  country_2    name_2  
0       TWN  00000535  
1       HUN  00000795  
2       USA  00001054

回答于 2025-04-12 由 Python大师

分享举报

代码

out = (df
       .assign(index=df.groupby('group').cumcount().add(1).astype('str'))
       .pivot(index='group', columns='index')
       .pipe(lambda x: x.set_axis(x.columns.map('_'.join), axis=1))
       .sort_index(axis=1, 
                   key=lambda x: x.str.extract(r'.+_(\d+$)')[0].astype('int'))
       .reset_index()
)

输出

回答于 2025-04-12 由 Python大师

分享举报

如何将数据框从长格式转换为宽格式并带有“分组”列？

3 个回答

撰写回答