如何“延长轴心”,同时使用列名中的字段填充?

2024-05-29 06:57:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框:

df = pd.DataFrame( {'Sony | TV | Model | value': {0: 'A222', 1: 'A234', 2: 'A4345'}, 'Sony | TV | Quantity | value': {0: 5, 1: 5, 2: 4}, 'Sony | TV | Max-quant | value': {0: 10, 1: 9, 2: 9}, 'Panasonic | TV | Model | value': {0: 'T232', 1: 'S3424', 2: 'X3421'}, 'Panasonic | TV | Quantity | value': {0: 1, 1: 5, 2: 1}, 'Panasonic | TV | Max-quant | value': {0: 10, 1: 12, 2: 11}, 'Sanyo | Radio | Model | value': {0: 'S111', 1: 'S1s1', 2: 'S1s2'}, 'Sanyo | Radio | Quantity | value': {0: 4, 1: 2, 2: 4}, 'Sanyo | Radio | Max-quant | value': {0: 9, 1: 9, 2: 10}} )

每列由四个字段组成:制造商、设备、型号和值。我需要将表转换为较长的格式,但也需要解析列名中的信息。 输出应如下所示:

Manufacturer    Device  Model   Quantity    Max quantity
Sony              TV    A222       5            10
Sony              TV    A234       5            9
Sony              TV    A4345      4            9
Panasonic         TV    T232       1            10
Panasonic         TV    S3424      5            12
Panasonic         TV    X3421      1            11
Sanyo             Radio S111       4            9
Sanyo             Radio S1s1       2            9
Sanyo             Radio S1s2       4            10

在R中,我会将pivot_与名称_模式一起使用更长的时间,然后将pivot_使用得更广

如何在Python中实现这一点


Tags: modelvaluetvmaxquantityradioquantsony
2条回答

一个选项是pyjanitor中的pivot_longer函数,它可以帮助抽象从宽到长的形状重塑:

# pip install pyjanitor
import pandas as pd
import janitor as jn

df.pivot_longer(index = None, 
                names_to = ('Manufacturer', 'Device', '.value'), 
                names_pattern = r"(.+)\s\|(.+)\s\|(.+)\s\|.+")
 
  Manufacturer  Device  Model   Quantity   Max-quant
0         Sony      TV   A222          5          10
1         Sony      TV   A234          5           9
2         Sony      TV  A4345          4           9
3    Panasonic      TV   T232          1          10
4    Panasonic      TV  S3424          5          12
5    Panasonic      TV  X3421          1          11
6        Sanyo   Radio   S111          4           9
7        Sanyo   Radio   S1s1          2           9
8        Sanyo   Radio   S1s2          4          10

试试这个:

df = pd.DataFrame( {'Sony | TV | Model | value': {0: 'A222', 1: 'A234', 2: 'A4345'}, 
                    'Sony | TV | Quantity | value': {0: 5, 1: 5, 2: 4}, 
                    'Sony | TV | Max-quant | value': {0: 10, 1: 9, 2: 9}, 
                    'Panasonic | TV | Model | value': {0: 'T232', 1: 'S3424', 2: 'X3421'}, 
                    'Panasonic | TV | Quantity | value': {0: 1, 1: 5, 2: 1}, 
                    'Panasonic | TV | Max-quant | value': {0: 10, 1: 12, 2: 11}, 
                    'Sanyo | Radio | Model | value': {0: 'S111', 1: 'S1s1', 2: 'S1s2'}, 
                    'Sanyo | Radio | Quantity | value': {0: 4, 1: 2, 2: 4}, 
                    'Sanyo | Radio | Max-quant | value': {0: 9, 1: 9, 2: 10}} )

# Create a multiIndex column header
df.columns = pd.MultiIndex.from_arrays(zip(*df.columns.str.split('\s?\|\s?')))

#Reshape the dataframe using `set_index`, `droplevel`, and `stack`
df.stack([0,1]).droplevel(1, axis=1).set_index('Model', append=True)\
               .rename_axis([None,'Manufacturer', 'Device', 'Model'])\
               .sort_index(level=[1,2,3])\
               .reset_index().drop('level_0', axis=1)
     

输出:

  Manufacturer Device  Model  Max-quant  Quantity
0    Panasonic     TV  S3424       12.0       5.0
1    Panasonic     TV   T232       10.0       1.0
2    Panasonic     TV  X3421       11.0       1.0
3        Sanyo  Radio   S111        9.0       4.0
4        Sanyo  Radio   S1s1        9.0       2.0
5        Sanyo  Radio   S1s2       10.0       4.0
6         Sony     TV   A222       10.0       5.0
7         Sony     TV   A234        9.0       5.0
8         Sony     TV  A4345        9.0       4.0

相关问题 更多 >

    热门问题