DataFrameMapper scikitlearn值错误：除连接轴外，所有输入数组维度都必须完全匹配

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data" names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Schuked weight', 'Viscera weight', 'Shell weight', 'Rings'] df = pd.read_csv(url, names=names) mapper = DataFrameMapper( [('Height', Normalizer()), ('Sex', LabelBinarizer())] ) stages = [] stages += [("mapper", mapper)] estimator = DecisionTreeClassifier() stages += [("dtree", estimator)] pipeline = Pipeline(stages) labelCol = 'Rings' target = df[labelCol] data = df.drop(labelCol, axis=1) train_data, test_data, train_target, expected = train_test_split(data, target, test_size=0.25, random_state=33) model = pipeline.fit(train_data, train_target)

1条回答

网友

1楼 · 发布于 2024-04-23 11:32:50

您必须更改DataFrameMapper的结构：

mapper = DataFrameMapper(
    [(['Height'], Normalizer()), ('Sex', LabelBinarizer())]
)

这是一个微妙的细节，可以在sklearn_pandas的文档中找到：

Map the Columns to Transformations
The difference between specifying the column selector as 'column' (as a simple string) and ['column'] (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector.
[...]
Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like OneHotEncoder or Imputer, expect 2-dimensional input, with the shape [n_samples, n_features].

相关问题更多 >

编程相关推荐

热门问题

热门文章