我有一个dataframe,它的一些列(C1、C2、C3)是分类(string)变量。数据和数据类型如下:
C1 C2 C3 C4 C5 \
4 b'02e197c5' b'c2ced437' b'a2427619' b'3f85ecae' b'b8c51ab7'
9 b'62770d79' b'ad984203' b'ddd956c1' b'f7f54f97' b'bbaea1c0'
13 b'7ffd46c3' b'710103fd' b'a1407382' b'f2463ffb' b'664ff944'
14 b'9a8cb066' b'7a06385f' b'417e6103' b'6faef306' b'f8990a45'
45 b'6f877ce8' b'58cc2d25' b'9b48ba97' b'f2463ffb' b'd90dd51f'
数据类型:
^{pr2}$然后,我使用DictVectorizer为字符串应用了一个热代码
labelTransformer = DictVectorizer(dtype='str')
labelTransformer.fit_transform(clickDataFrame["C1"].astype("str"))
但在那之后,我得到的错误如下:
File "click_main.py", line 60, in <module>
df2 = labelTransformer.fit_transform(clickDataFrame["C1"].astype("str"))
File "/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 230, in fit_transform
return self._transform(X, fitting=True)
File "/usr/local/lib/python3.6/dist-packages/sklearn/feature_extraction/dict_vectorizer.py", line 166, in _transform
for f, v in six.iteritems(x):
File "/usr/local/lib/python3.6/dist-packages/sklearn/externals/six.py", line 439, in iteritems
return iter(getattr(d, _iteritems)(**kw))
AttributeError: 'str' object has no attribute 'items'
我试了很多次,但还是找不到解决办法?在
您可以使用
pd.get_dummies
直接从pandas获得一个热编码。如果您想独立地处理每个列,您可以简单地执行pd.get_dummies(df)
或pd.get_dummies(df.C1)
。在如果要为所有列中的每个唯一值获取指示符,可以使用
pd.get_dummies(df.stack()).unstack().swaplevel(0, 1, axis=1)
。在相关问题 更多 >
编程相关推荐