在Python中使用while函数将短语更改为向量

Article 1. It is not good to eat pizza after midnight Article 2. I wouldn't survive a day withouth stackexchange Article 3. All of these are just random phrases Article 4. To prove if my experiment works. Article 5. The red dog jumps over the lazy fox

from sklearn.feature_extraction.text import CountVectorizer vectorizer = CountVectorizer(min_df=1) n=0 while n < 5: n = n + 1 a = ('Article %(number)s' % {'number': n}) print(a) with open("LISR2.txt") as openfile: for line in openfile: if a in line: X=line print(vectorizer.fit_transform(X))

2条回答

网友

1楼 · 编辑于 2024-05-13 17:21:03

当您提供原始数据时会出现这个问题，这意味着直接将字符串提供给提取函数，而您可以提供Y=[X]并将这个Y作为参数传递，然后您将得到正确的结果我也遇到了这个问题

网友

2楼 · 编辑于 2024-05-13 17:21:03

看看the docs。它说CountVectorizer.fit_transform需要一个iterable字符串（例如，字符串列表）。而是传递一个字符串。

这很有意义，scikit中的fit_转换做两件事：1）它学习一个模型（fit）2）它将模型应用于数据（transform）。您需要构建一个矩阵，其中列是词汇表中的所有单词，行对应于文档。为此，你需要知道你的语料库中的全部词汇（所有的列）。

相关问题更多 >

编程相关推荐

热门问题

热门文章