分类数据Onehot编码

2条回答

网友

1楼 · 编辑于 2024-05-16 06:42:47

对于机器学习，有许多不同的编码分类变量的方法，我们在scikit learn contrib包中实现了其中的一些（包括一个Hot）：category_encoders:

https://github.com/scikit-learn-contrib/categorical-encoding

如果您已经在使用scikit learn和/或pandas，这可能是一个简单的解决方案。像你提到的高维性，以及你不一定事先知道所有类别的情况下，使用HashingEncoder这样的东西可能会有更好的运气。在

网友

2楼 · 编辑于 2024-05-16 06:42:47

演示：

In [64]: from sklearn.feature_extraction.text import CountVectorizer

In [65]: cv = CountVectorizer()

In [66]: X = cv.fit_transform(mesh)

In [67]: X.A
Out[67]:
array([[1, 1, 1, 0],
       [1, 1, 0, 1]], dtype=int64)

列名：

^{pr2}$

我们可以用熊猫。Sparsetalaframe公司名称：

In [135]: import pandas as pd

In [136]: pd.SparseDataFrame(X, columns=cv.get_feature_names(), default_fill_value=0)
Out[136]:
   aligator  cat  dog  mouse
0         1    1    1      0
1         1    1    0      1

相关问题更多 >

编程相关推荐

热门问题

热门文章

分类数据Onehot编码

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >