pandas datafram中的编码/分解列表

>>> import pandas as pd >>> df = pd.DataFrame({'A': [ ['Other', 'Male', 'Female', 'Male', 'Other'], ['Female', 'Other', 'Male'] ]}) >>> df['B'] = df.A.apply(lambda x: pd.factorize(x)[0]) >>> df A B 0 [Other, Male, Female, Male, Other] [0, 1, 2, 1, 0] 1 [Female, Other, Male] [0, 1, 2]

2条回答

网友

1楼 · 编辑于 2024-05-15 06:11:33

您可以使用sklearn中的^{}：

安装编码器：

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
le.fit([s for l in df.A for s in l])

转换列：

^{pr2}$

网友

2楼 · 编辑于 2024-05-15 06:11:33

您可以使用列A中的所有值轻松地自己完成。在

首先，使用集合理解来创建列A中所有唯一项的集合。然后使用字典理解，其中键是这些唯一项，并且基于这些排序的唯一项枚举值。在

然后使用列表理解法查找字典中的条目。在

s = set(item for sublist in df.A for item in sublist)
s = {k: n for n, k in enumerate(sorted(list(s)))}

>>> df.assign(B=[[s[key] for key in sublist] for sublist in df['A']])
                                    A                B
0  [Other, Male, Female, Male, Other]  [2, 1, 0, 1, 2]
1               [Female, Other, Male]        [0, 2, 1]

相关问题更多 >

编程相关推荐

热门问题

热门文章

pandas datafram中的编码/分解列表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >