DataFrameMapper scikitlearn值错误:除连接轴外,所有输入数组维度都必须完全匹配

2024-04-23 11:32:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在尝试使用DataFrameMapper将数据帧上的多个预处理转换添加到scikit学习管道中。在

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data"
names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Schuked weight', 'Viscera weight', 'Shell weight', 'Rings']

df = pd.read_csv(url, names=names)

mapper = DataFrameMapper(
    [('Height', Normalizer()), ('Sex', LabelBinarizer())]
)

stages = []

stages += [("mapper", mapper)]

estimator = DecisionTreeClassifier()

stages += [("dtree", estimator)]

pipeline = Pipeline(stages)

labelCol = 'Rings'
target = df[labelCol]
data = df.drop(labelCol, axis=1)

train_data, test_data, train_target, expected = train_test_split(data, target, test_size=0.25, random_state=33)

model = pipeline.fit(train_data, train_target)

但是,我得到了以下错误:

^{pr2}$

我错过了什么?在

谢谢:)


Tags: testurltargetdfdatanamestrainmapper
1条回答
网友
1楼 · 发布于 2024-04-23 11:32:50

您必须更改DataFrameMapper的结构:

mapper = DataFrameMapper(
    [(['Height'], Normalizer()), ('Sex', LabelBinarizer())]
)

这是一个微妙的细节,可以在sklearn_pandas的文档中找到:

Map the Columns to Transformations

The difference between specifying the column selector as 'column' (as a simple string) and ['column'] (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.

[...]

Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like OneHotEncoder or Imputer, expect 2-dimensional input, with the shape [n_samples, n_features].

相关问题 更多 >