Python - 数据反透视

2 投票
2 回答
5531 浏览
提问于 2025-04-30 11:14

我在找一个用python的解决方案。需要帮助把一个数据框进行“反透视”,也就是把数据的结构调整一下。现在的数据结构有点奇怪,不太适合用基本的透视功能来重塑。

当前的数据框 - 这是我现在的样子

ABC Mechanical              

              Standard  15-Day  10-Day  5-Day
Terminal Units  0.49    0.75    0.69    0.63
Diffusers       0.35    0.55    0.45    0.4
Vent            0.8     0.95    0.9     0.85
Piping          0.7     0.85    0.8     0.75
Stoves          0.6     0.8     0.75    0.7

反透视后的数据框 - 这是我想要的样子

df.columns= Customer, Product Category, Ship Cycle, Multiplier
df.index= ABC Mechanical 

Customer    Product Category    Ship Cycle  Multiplier
ABC Mechanical  Terminal Units  Standard    0.49
ABC Mechanical  Terminal Units  15-Day      0.75
ABC Mechanical  Terminal Units  10-Day      0.69
ABC Mechanical  Terminal Units  5-Day       0.63
ABC Mechanical  Diffusers       Standard    0.35
ABC Mechanical  Diffusers       15-Day      0.55
ABC Mechanical  Diffusers       10-Day      0.45
ABC Mechanical  Diffusers       5-Day       0.4

任何帮助都非常感谢!

谢谢!

暂无标签

2 个回答

2

或者你可以使用 unstack()

In [41]:

print df
                Standard  15-Day  10-Day  5-Day
Terminal_Units      0.49    0.75    0.69   0.63
Diffusers           0.35    0.55    0.45   0.40
Vent                0.80    0.95    0.90   0.85
Piping              0.70    0.85    0.80   0.75
Stoves              0.60    0.80    0.75   0.70
In [42]:

df2 = df.stack().reset_index()
df2.columns = ['Product Category',    'Ship Cycle',  'Multiplier']
df2['Customer'] = 'ABC Mechanical'
In [43]:

print df2.icol([3,0,1,2])
          Customer Product Category Ship Cycle  Multiplier
0   ABC Mechanical   Terminal_Units   Standard        0.49
1   ABC Mechanical   Terminal_Units     15-Day        0.75
2   ABC Mechanical   Terminal_Units     10-Day        0.69
3   ABC Mechanical   Terminal_Units      5-Day        0.63
4   ABC Mechanical        Diffusers   Standard        0.35
5   ABC Mechanical        Diffusers     15-Day        0.55
6   ABC Mechanical        Diffusers     10-Day        0.45
7   ABC Mechanical        Diffusers      5-Day        0.40
8   ABC Mechanical             Vent   Standard        0.80
9   ABC Mechanical             Vent     15-Day        0.95
10  ABC Mechanical             Vent     10-Day        0.90
11  ABC Mechanical             Vent      5-Day        0.85
12  ABC Mechanical           Piping   Standard        0.70
13  ABC Mechanical           Piping     15-Day        0.85
14  ABC Mechanical           Piping     10-Day        0.80
15  ABC Mechanical           Piping      5-Day        0.75
16  ABC Mechanical           Stoves   Standard        0.60
17  ABC Mechanical           Stoves     15-Day        0.80
18  ABC Mechanical           Stoves     10-Day        0.75
19  ABC Mechanical           Stoves      5-Day        0.70
6

如果 df 看起来像这样:

In [26]: df
Out[26]: 
                Standard  15-Day  10-Day  5-Day
Terminal Units      0.49    0.75    0.69   0.63
Diffusers           0.35    0.55    0.45   0.40
Vent                0.80    0.95    0.90   0.85
Piping              0.70    0.85    0.80   0.75
Stoves              0.60    0.80    0.75   0.70

那么 pd.melt 可以让你接近想要的结果数据表:

In [27]: pd.melt(df.reset_index(), id_vars=['index']).sort_values(by=['index'])
Out[27]: 
             index  variable  value
1        Diffusers  Standard   0.35
6        Diffusers    15-Day   0.55
11       Diffusers    10-Day   0.45
16       Diffusers     5-Day   0.40
3           Piping  Standard   0.70
8           Piping    15-Day   0.85
13          Piping    10-Day   0.80
18          Piping     5-Day   0.75
4           Stoves  Standard   0.60
9           Stoves    15-Day   0.80
14          Stoves    10-Day   0.75
19          Stoves     5-Day   0.70
0   Terminal Units  Standard   0.49
5   Terminal Units    15-Day   0.75
10  Terminal Units    10-Day   0.69
15  Terminal Units     5-Day   0.63
2             Vent  Standard   0.80
7             Vent    15-Day   0.95
12            Vent    10-Day   0.90
17            Vent     5-Day   0.85

我不太明白“ABC Mechanical”在原始数据表中在哪里,所以我没有尝试把它包含在结果里。你可以这样重命名列名:

In [28]: df = pd.melt(df.reset_index(), id_vars=['index']).sort_values(by=['index'])
In [29]: df.columns = ['Product Category', 'Ship Cycle', 'Multiplier']
In [31]: df.head()
Out[31]: 
   Product Category Ship Cycle  Multiplier
1         Diffusers   Standard        0.35
6         Diffusers     15-Day        0.55
11        Diffusers     10-Day        0.45
16        Diffusers      5-Day        0.40
3            Piping   Standard        0.70

撰写回答