我训练了一个逻辑回归模型,用于文本数据的多重分类。我想从模型中生成一个样本预测,但是我得到了这个错误
ValueError: X has 30 features per sample; expecting 100000
下面是对文本数据进行矢量化的代码
tfidf_pipeline = Pipeline([
('tfidf' ,TfidfVectorizer(max_features=50000, ngram_range=(1, 3), stop_words = 'english', strip_accents= 'ascii',))])
preprocessor_pipeline = ColumnTransformer(
transformers=[
('short_description', tfidf_pipeline,'short_description'),
('details', tfidf_pipeline,'details'),
])
下面是我试图运行的代码,但是上面出现了后一个错误
d = {'short_description' : ['[mitigated] [ubl5] ssd slam station not working'],
'details' : ['ssd slam station not working, unable to take slam from the station.']}
df_test = pd.DataFrame(data=d)
X = df_test[['short_description', 'details']]
X_prep = preprocessor_pipeline.fit_transform(X)
y_p = lr.predict(X_prep)
训练和测试步骤的
preprocessor_pipeline
必须相同以下是一个最小的可复制示例:
结果:
它需要
transform
而不是fit_transform
:相关问题 更多 >
编程相关推荐