将sklearn函数应用于pandas数据帧会产生ValueError(“未知的标签类型:%r%y)

2024-05-12 14:36:01 发布

您现在位置:Python中文网/ 问答频道 /正文

以下代码给出错误消息:

    >>> import pandas as pd
    >>> from sklearn import preprocessing, svm
    >>> df = pd.DataFrame({"a": [0,1,2], "b":[0,1,2], "c": [0,1,2]})
    >>> clf = svm.SVC()
    >>> df = df.apply(lambda x: preprocessing.scale(x))
    >>> clf.fit(df[["a", "b"]], df["c"])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
     151, in fit
        y = self._validate_targets(y)
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\svm\base.py", lin
     515, in _validate_targets
        check_classification_targets(y)
      File "C:\Users\Alexander\Anaconda\lib\site-packages\sklearn\utils\multiclass.
    y", line 173, in check_classification_targets
        raise ValueError("Unknown label type: %r" % y)
    ValueError: Unknown label type: 0   -1.224745
    1    0.000000
    2    1.224745
    Name: c, dtype: float64

pandas数据帧的数据类型不是对象,因此应用sklearn-svm函数应该是很好的,但是由于某些原因它不能识别分类标签。是什么导致了这个问题?


Tags: inimportpandasdflibpackagessiteanaconda
1条回答
网友
1楼 · 发布于 2024-05-12 14:36:01

问题是,在缩放步骤之后,标签是浮点数,这不是有效的标签类型;如果转换为intstr,它应该可以工作:

In [32]: clf.fit(df[["a", "b"]], df["c"].astype(int))
Out[32]: 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

相关问题 更多 >