如何将多行折叠为一行并创建一系列列元素

2024-06-13 00:11:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,如下所示:

                tags    categories            classification
          0    label    ['legislative', 
                         'law, govt and 
                         politics', 'exe...        None
          0   document  ['legislative', 
                         'law, govt and politics', 
                            'exe...                 NaN
          0     text    ['legislative', 'law, 
                          govt and politics', 
                          'exe...                   NaN
          0     paper   ['legislative', 'law, 
                          govt and 
                        politics', 'exe...          NaN
          0     poster  ['legislative', 'law, 
                        govt and politics', 'exe... NaN
        

我想创建一个新的数据框,在这里我可以将上面的数据框折叠成下面的数据框,这样列“tags”和“classification”的列元素将转换成单个行,其中包含列表格式的单个项,例如

                tags     categories           classification
       0   ['label',     ['legislative',      ['None','NaN',
           'document',  'law, govt and          'NaN','NaN',
             'text',          politics', 'exe...    'NaN']                
         'paper',poster']

我该怎么做呢?如何使用堆栈或group by函数来获得结果?提前谢谢

*以下是df.to_dict()的结果

           {'tags': {0: ' letter',
            1: ' head',
            2: ' water',
            3: ' art',
            4: ' indoors',
            5: ' flyer',
            6: ' poster',
            ...},
            'categories': {0: "['legislative', 'law, govt and politics', 
            'executive branch', 'work', 'society', 'government']",
            1: "['unrest and war', 'society', 'religion and spirituality', 
            'buddhism']",
            2: '[]',
            3: '[]',
            4: "['unemployment', 'society', 'law, govt and politics', 
            'foreign policy', 'work', 'politics', 'armed forces']",
            5: '[]',
            6: "['sports', 'law, govt and politics', 'wrestling']",
            ...},
            'classfication': {0: nan,
            1: nan,
            2: nan,
            3: nan,
            4: nan,
            5: nan,
            6: nan,
            ...}}

Tags: and数据tagsnanexelabelcategoriesclassification
1条回答
网友
1楼 · 发布于 2024-06-13 00:11:18

我没有完全理解你的问题。但是你想要这样的东西吗

df:

    trial_num   subject samples
0   1           1       [-1.74, -0.78, -0.11]
1   2           1       [0.86, 0.21, -0.01]
2   3           1       [2.04, 0.6, -0.79]
3   1           2       [0.52, 0.49, 1.56]
4   2           2       [0.07, 0.84, -1.1]
5   3           2       [0.43, -1.3, 1.99]

转换后的df:

     trial_num          subject             samples
0   [1, 2, 3, 1, 2, 3]  [1, 1, 1, 2, 2, 2]  [[-1.74, -0.78, -0.11], [0.86, 0.21, -0.0...trial_num   subject samples
0   [1, 2, 3, 1, 2, 3]  [1, 1, 1, 2, 2, 2]  [[-1.74, -0.78, -0.11], [0.86, 0.21, -0.0...

import pandas as pd
df = pd.DataFrame(
    {'trial_num': [1, 2, 3, 1, 2, 3],
     'subject': [1, 1, 1, 2, 2, 2],
     'samples': [list(np.random.randn(3).round(2)) for i in range(6)]
    }
)
df = df.astype(str).apply(', '.join).apply(lambda x: x.split(',')).to_frame().T

相关问题 更多 >