在dataframe中将具有列表的列拆分为多个列

3条回答

网友

1楼 · 编辑于 2024-06-12 09:32:31

当您将列表转换为数据帧字符串数据时，如果不进行分隔，将合并为单个数据以克服此问题，您必须在转换为数据帧之前插入逗号，如下所示

import pandas as pd
data={"task":["S101-10061","S101-10069","S101-10078","S101-10088","S101-10100","S101-10102","S101-10133","S101YGBgZ2"],
     "m_label":[['Cecum Landmark','ICV' ,'Comment' ,'Appendiceal orifice'],['Rectum RF','ICV','Cecum Landmark','TI','Comment','Transverse']
               ,['Appendiceal orifice' ,'ICV' ,'Cecum Landmark', 'Comment', 'Transverse','Rectum RF'],['Cecum Landmark', 'ICV', 'Comment', 'Appendiceal orifice'],
               ['Transverse' ,'Appendiceal orifice', 'ICV', 'Cecum Landmark', 'Comment'],['Rectum RF' ,'ICV' ,'Cecum Landmark', 'Comment' ,'TI' ,'Transverse','Appendiceal orifice'],
               ['Rectum RF', 'Transverse' ,'ICV' ,'Cecum Landmark', 'Comment'],['Comment']]}
data=pd.DataFrame(data)

dataframe应该是这样的

        task    m_label
0   S101-10061  [Cecum Landmark, ICV, Comment, Appendiceal ori...
1   S101-10069  [Rectum RF, ICV, Cecum Landmark, TI, Comment, ...
2   S101-10078  [Appendiceal orifice, ICV, Cecum Landmark, Com...
3   S101-10088  [Cecum Landmark, ICV, Comment, Appendiceal ori...
4   S101-10100  [Transverse, Appendiceal orifice, ICV, Cecum L...
5   S101-10102  [Rectum RF, ICV, Cecum Landmark, Comment, TI, ...
6   S101-10133  [Rectum RF, Transverse, ICV, Cecum Landmark, C...
7   S101YGBgZ2  [Comment]

输出代码

import numpy as np
data=pd.concat([data["task"],data["m_label"].apply(lambda x:pd.Series(x).add_prefix("m_label"))],axis=1).replace(np.nan," ")

task    m_label0    m_label1    m_label2    m_label3    m_label4    m_label5    m_label6
0   S101-10061  Cecum Landmark  ICV Comment Appendiceal orifice         
1   S101-10069  Rectum RF   ICV Cecum Landmark  TI  Comment Transverse  
2   S101-10078  Appendiceal orifice ICV Cecum Landmark  Comment Transverse  Rectum RF   
3   S101-10088  Cecum Landmark  ICV Comment Appendiceal orifice         
4   S101-10100  Transverse  Appendiceal orifice ICV Cecum Landmark  Comment     
5   S101-10102  Rectum RF   ICV Cecum Landmark  Comment TI  Transverse  Appendiceal orifice
6   S101-10133  Rectum RF   Transverse  ICV Cecum Landmark  Comment     
7   S101YGBgZ2  Comment

网友

2楼 · 编辑于 2024-06-12 09:32:31

使用str.findall并传递正则表达式以捕获由单个''包围的所有内容，然后应用pd.Series将它们转换为列

df=df.set_index('task')['m_label'].str.findall('\'(.*?)\'').apply(pd.Series)
df.columns = [f'm_label{i+1}' for i in df]

输出：

                       m_label1             m_label2        m_label3               m_label4    m_label5    m_label6             m_label7  
task                                                                                                                                       
S101-10061       Cecum Landmark                  ICV         Comment    Appendiceal orifice         NaN         NaN                  NaN   
S101-10069            Rectum RF                  ICV  Cecum Landmark                     TI     Comment  Transverse                  NaN   
S101-10078  Appendiceal orifice                  ICV  Cecum Landmark                Comment  Transverse   Rectum RF                  NaN   
S101-10088       Cecum Landmark                  ICV         Comment    Appendiceal orifice         NaN         NaN                  NaN   
S101-10100           Transverse  Appendiceal orifice             ICV         Cecum Landmark     Comment         NaN                  NaN   
S101-10102            Rectum RF                  ICV  Cecum Landmark                Comment          TI  Transverse  Appendiceal orifice   
S101-10133            Rectum RF           Transverse             ICV         Cecum Landmark     Comment         NaN                  NaN   
S101YGBgZ2              Comment                  NaN             NaN                    NaN         NaN         NaN                  NaN

如果需要，您可以稍后重置索引，然后fillna('')

网友

3楼 · 编辑于 2024-06-12 09:32:31

为了给pyguy的答案添加一些内容，如果您想“动态”重命名列，可以使用add_prefix()

df.set_index('task')['m_label'].str.findall('\'(.*?)\'').apply(pd.Series).add_prefix('m_label')

输出：

Out[27]: 
                  m_label0 m_label1  ... m_label4    m_label5
task                                 ...                     
S101-10061  Cecum Landmark      ICV  ...      NaN         NaN
S101-10069       Rectum RF      ICV  ...  Comment  Transverse

相关问题更多 >

编程相关推荐

热门问题

热门文章

在dataframe中将具有列表的列拆分为多个列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >