如何通过一个数据集的单一特征来训练使用sklearn kneighbors分类器和预测值?

2024-03-28 15:05:54 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我读取了一个csv数据集,然后使用pandas dataframe存储它,然后将数据分割成训练集和测试集。我想在以后的时间里用一个预测器来预测什么才是最好的。我是python和机器学习的新手,所以请告诉我。这实际上是我第一次尝试两者。我在这行my_knn_for_cs4661.fit(X_train[col], y_train)关于array.reshape(-1,1)的一些错误,我试图做X_train[col].reshape(-1,1),但是我得到了一些其他的错误。我在jupyter笔记本、sklearn、numpy和pandas上使用python3。在

下面是我的代码和错误

from sklearn.model_selection import train_test_split

iris_df = pd.read_csv('https://raw.githubusercontent.com/mpourhoma/CS4661/master/iris.csv')
feature_cols = ['sepal_length','sepal_width','petal_length','petal_width']
X = iris_df[feature_cols] 
y = iris_df['species']
predictions= {}

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=6)

k = 3
my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)

for col in feature_cols:

    my_knn_for_cs4661.fit(X_train[col], y_train)
    y_predict = my_knn_for_cs4661.predict(X_test)
    predictions[col] = y_predict

我的错误:

^{pr2}$

Tags: csv数据testirisdfformy错误
2条回答

我找到了一个解决办法,虽然看起来很老套,如果这是Python的方式。在

iris_df = pd.read_csv('https://raw.githubusercontent.com/mpourhoma/CS4661/master/iris.csv')
feature_cols = ['sepal_length','sepal_width','petal_length','petal_width']
X = iris_df[feature_cols] 
y = iris_df['species']
predictions= {}

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=6)

k = 3
my_knn_for_cs4661 = KNeighborsClassifier(n_neighbors=k)

for col in feature_cols:
    my_knn_for_cs4661.fit(X_train[col].values.reshape(-1,1), y_train)
    y_predict = my_knn_for_cs4661.predict(X_test[col].values.reshape(-1,1))
    predictions[col] = accuracy_score(y_test, y_predict)


print(predictions)

Expected 2D array, got 1D array instead表示在实现KNeighborClassifier时,训练数据集必须至少包含两个特性,例如

X_train[['sepal_length', 'sepal_width']]

相关问题 更多 >