Python:在TFIDF中使用列表

from sklearn.feature_extraction.text import TfidfVectorizer tfidf_vectorizer = TfidfVectorizer(norm=None) list_contents =[] for index, row in df.iterrows(): list_contents.append(' '.join(row.Tokens)) # list_contents = df.Content.values tfidf_matrix = tfidf_vectorizer.fit_transform(list_contents) df_tfidf = pd.DataFrame(tfidf_matrix.toarray(),columns= [tfidf_vectorizer.get_feature_names()]) df_tfidf.head(10)

1条回答

网友

1楼 · 发布于 2024-06-17 10:20:14

不确定我是否理解您的意思，但是如果您想让向量器考虑一个固定的单词列表，您可以使用vocabulary参数。在

my_words = ["foo","bar","baz"]

# set the vocabulary parameter with your list of words
tfidf_vectorizer = TfidfVectorizer(
    norm=None,
    vocabulary=my_words)  

list_contents =[]
for index, row in df.iterrows():
    list_contents.append(' '.join(row.Tokens))

# this matrix will have only 3 columns because we have forced
# the vectorizer to use just the words foo bar and baz
# so it'll ignore all other words in the documents.
tfidf_matrix = tfidf_vectorizer.fit_transform(list_contents)

相关问题更多 >

编程相关推荐

热门问题

热门文章