如何使用经过训练的文本分类模型

data = pd.read_csv('data1.csv',encoding='cp1252') def pre_process(text): text = text.translate(str.maketrans('', '', string.punctuation)) text = [word for word in text.split() if word.lower() not in stopwords.words('english')] words = "" for i in text: stemmer = SnowballStemmer("english") words += (stemmer.stem(i))+" " return words textFeatures = data['textForCategorized'].copy() textFeatures = textFeatures.apply(pre_process) vectorizer = TfidfVectorizer("english") features = vectorizer.fit_transform(textFeatures) features_train, features_test, labels_train, labels_test = train_test_split(features, data['class'], test_size=0.3, random_state=111) svc = SVC(kernel='sigmoid', gamma=1.0) clf = svc.fit(features_train, labels_train) prediction = svc.predict(features_test)

2条回答

网友

1楼 · 编辑于 2024-06-16 18:30:18

我也遇到了同样的问题，并通过根据训练数据的形状调整单个字符串数据的大小来解决

完整代码：

joblib.dump(clf, "model.pkl") 
classifer = joblib.load("model.pkl")
textFeature = "Dengue soaring in ......" 
vocabulary=pre_process(textFeature) 
vocabulary_df =pd.Series(vocabulary)

#### Feature extraction using Tfidf Vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(stop_words='english')

test_ = vectorizer.fit_transform(vocabulary_df.values)

test_.resize(1, features_train.shape[1])
classifer.predict(test_)

网友

2楼 · 编辑于 2024-06-16 18:30:18

您应该在将new_observation馈送到模型之前对其进行预处理。在您的情况下，您只为培训预处理了textFeatures，您也必须为new_observation重复预处理步骤

对new_observation应用pre_process()函数
使用vectorizer转换从pre_process(new_observation)获得的输出

相关问题更多 >

编程相关推荐

热门问题

热门文章