我有一个关于将dataframe列中的列表拆分为多个列的问题。但拆分的每个值都需要放在特定的列中
假设我有这个数据帧:
date data
2020-01-01 00:00:00 [G07, G08, G10, G16]
2020-01-01 00:00:01 [G07, G08, G16]
2020-01-01 00:00:02 [G08, G10, G16, G20, G21]
2020-01-01 00:00:03 [G16, G20, G21, G26, G27, R02]
2020-01-01 00:00:04 [G07, G08, G26, G27]
我在寻找这样的结果:
date G07 G08 G10 G16 G20 G21 G26 G27 R02
2020-01-01 00:00:00 G07 G08 G10 G16 NaN NaN NaN NaN NaN
2020-01-01 00:00:01 G07 G08 NaN G16 NaN NaN NaN NaN NaN
2020-01-01 00:00:02 NaN G08 G10 G16 G20 G21 NaN NaN NaN
2020-01-01 00:00:03 NaN NaN NaN G16 G20 G21 G26 G27 R02
2020-01-01 00:00:04 G07 G08 NaN NaN NaN NaN G26 G27 NaN
要最终得到这种矩阵:
date G07 G08 G10 G16 G20 G21 G26 G27 R02
2020-01-01 00:00:00 1 1 1 1 0 0 0 0 0
2020-01-01 00:00:01 1 1 0 1 0 0 0 0 0
2020-01-01 00:00:02 0 1 1 1 1 1 0 0 0
2020-01-01 00:00:03 0 0 0 1 1 1 1 1 1
2020-01-01 00:00:04 1 1 0 0 0 0 1 1 0
通过执行此类型的命令:
In [1] pd.DataFrame(self.df['data'].to_list())
Out [1] date 1 2 3 4 5 6
2020-01-01 00:00:00 G07 G08 G10 G16
2020-01-01 00:00:01 G07 G08 G16
2020-01-01 00:00:02 G08 G10 G16 G20 G21
2020-01-01 00:00:03 G16 G20 G21 G26 G27 R02
2020-01-01 00:00:04 G07 G08 G26 G27
我只能将列表拆分为其他列。但我无法找到将每个值放入特定列的方法
我一直在考虑对每个日期的每个值进行循环,但速度非常慢,而且我有超过1000000行的数据集
通过
join()
、strip()
、get_dummies()
和drop()
方法尝试:out
的输出:另一种方法:
印刷品:
检查来自
sklearn
的MultiLabelBinarizer
相关问题 更多 >
编程相关推荐