RandomForestClassifier.fit（）：ValueError:无法将字符串转换为浮点

cols = ['A','B','C'] col_types = {'A': str, 'B': str, 'C': int} test = pd.read_csv('test.csv', dtype=col_types) train_y = test['C'] == 1 train_x = test[cols] clf_rf = RandomForestClassifier(n_estimators=50) clf_rf.fit(train_x, train_y)

3条回答

网友

1楼 · 编辑于 2024-05-01 21:58:42

无法将str传递给模型fit()方法。正如它提到的here

The training input samples. Internally, it will be converted to dtype=np.float32 and if a sparse matrix is provided to a sparse csc_matrix.

尝试将数据转换为float并尝试LabelEncoder

网友

2楼 · 编辑于 2024-05-01 21:58:42

LabelEncoding对我来说很有效（基本上，你必须对数据特性进行编码）（mydata是字符串数据类型的2d数组）：

myData=np.genfromtxt(filecsv, delimiter=",", dtype ="|a20" ,skip_header=1);

from sklearn import preprocessing
le = preprocessing.LabelEncoder()
for i in range(*NUMBER OF FEATURES*):
    myData[:,i] = le.fit_transform(myData[:,i])

网友

3楼 · 编辑于 2024-05-01 21:58:42

在使用fit之前，必须进行一些编码。正如前面所说，fit（）不接受字符串，但您可以解决这个问题

可以使用几个类：

LabelEncoder：将字符串转换为增量值
OneHotEncoder：使用One-of-K算法将字符串转换为整数

就我个人而言，不久前我已经在StackOverflow上发布了几乎the same question。我想有一个可扩展的解决方案，但没有得到任何答案。我选择了对所有字符串进行二值化的OneHotEncoder。这是相当有效的，但如果你有很多不同的字符串矩阵将增长非常快，内存将是必需的

相关问题更多 >

编程相关推荐

热门问题

热门文章