在dataframe中将数据从列堆叠到行

2024-06-16 09:58:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试在熊猫数据框架中按年度堆叠财务价值。但无法开始

我所做的只是

df1 = df.set_index(['refnum','y1gp','y2gp','y3gp']).stack()\
.reset_index(name='REV').rename(columns={'level_5':'Year'})

现有:

^{tb1}$

预期:

^{tb2}$

Tags: 数据框架dfindexstack财务df1reset
3条回答

尝试:

df.columns = [re.sub(r"y(\d+)(.*)", r"\2-\1", c) for c in df.columns]
x = (
    pd.wide_to_long(
        df, stubnames=["", "gp", "rev"], sep="-", i="refnum", j="Base Year"
    )
    .rename(columns={"": "year"})
    .reset_index()
    .sort_values(by="refnum")
)
print(x)

印刷品:

   refnum  Base Year  year   gp  rev
0   10001          1  2021  200  300
3   10001          2  2022  600  100
6   10001          3  2023  300  300
1   10002          1  2020  200  300
4   10002          2  2021  500  200
7   10002          3  2022  300  300
2   10003          1  2021  200  300
5   10003          2  2022  500  500
8   10003          3  2023  300  300

您可以使用来自pyjanitorpivot_longer;对于这种情况,将正则表达式传递给names_pattern,并在names_to中传递新列名:

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index='refnum', 
                names_to=['year', 'REV', 'GP'], 
                names_pattern=['^y\d$', '.*rev$', '.*gp$']
               )

   refnum  year  REV   GP
0   10001  2021  300  200
1   10002  2020  300  200
2   10003  2021  300  200
3   10001  2022  100  600
4   10002  2021  200  500
5   10003  2022  500  500
6   10001  2023  300  300
7   10002  2022  300  300
8   10003  2023  300  300

如果希望包含基准年,可以在使用pivot_longer之前修改以数字结尾的列标签:

(df.rename(columns = lambda col: f"{col}YEAR" 
                                 if col.endswith(('1','2','3')) 
                                 else col)
   .pivot_longer(index='refnum', 
                 names_to= ("Base Year", ".value"), 
                 names_pattern=r".(\d)(.+)", 
                 sort_by_appearance=True)
 )

   refnum Base Year  YEAR  rev   gp
0   10001         1  2021  300  200
1   10001         2  2022  100  600
2   10001         3  2023  300  300
3   10002         1  2020  300  200
4   10002         2  2021  200  500
5   10002         3  2022  300  300
6   10003         1  2021  300  200
7   10003         2  2022  500  500
8   10003         3  2023  300  300

.value相关联的标签保留为列标题,而其余标签则集中到一个新列(base year

让我们使用^{}^{}然后^{}将标题转换为可用的多索引,以从宽格式转换为长格式。然后^{}创建BaseYear列

# Save Columns
df = df.set_index('refnum')
# Create a MultiIndex with Numbers at the end and split into multiple levels
df.columns = (
    df.columns.str.replace(r'^(.*?)(\d+)(.*)$', r'\1\3/\2', regex=True)
        .str.split('/', expand=True)
)
# Wide Format to Long + Rename Columns
df = df.stack().droplevel(-1).reset_index().rename(
    columns={'y': 'Year', 'ygp': 'GP', 'yrev': 'REV'}
)
# Add Base Year Column
df['BaseYear'] = "BaseYear+" + df.groupby('refnum').cumcount().astype(str)
# df['BaseYear'] = df.groupby('refnum').cumcount()  # (int version)

df

   refnum  Year   GP  REV    BaseYear
0   10001  2021  200  300  BaseYear+0
1   10001  2022  600  100  BaseYear+1
2   10001  2023  300  300  BaseYear+2
3   10002  2020  200  300  BaseYear+0
4   10002  2021  500  200  BaseYear+1
5   10002  2022  300  300  BaseYear+2
6   10003  2021  200  300  BaseYear+0
7   10003  2022  500  500  BaseYear+1
8   10003  2023  300  300  BaseYear+2

相关问题 更多 >