<p>我想用sklearn将以下短语改为vectors:</p>
<pre><code>Article 1. It is not good to eat pizza after midnight
Article 2. I wouldn't survive a day withouth stackexchange
Article 3. All of these are just random phrases
Article 4. To prove if my experiment works.
Article 5. The red dog jumps over the lazy fox
</code></pre>
<p>我得到了以下代码:</p>
<pre><code>from sklearn.feature_extraction.text import CountVectorizer
vectorizer = CountVectorizer(min_df=1)
n=0
while n < 5:
n = n + 1
a = ('Article %(number)s' % {'number': n})
print(a)
with open("LISR2.txt") as openfile:
for line in openfile:
if a in line:
X=line
print(vectorizer.fit_transform(X))
</code></pre>
<p>这给了我以下错误:</p>
<pre><code>ValueError: Iterable over raw text documents expected, string object received.
</code></pre>
<p>为什么会这样?我知道这应该有效,因为如果我单独输入:</p>
<pre><code>X=("It is not good to eat pizza","I wouldn't survive a day", "All of these")
print(vectorizer.fit_transform(X))
</code></pre>
<p>它给了我想要的向量。</p>
<pre><code>(0, 8) 1
(0, 2) 1
(0, 11) 1
(0, 3) 1
(0, 6) 1
(0, 4) 1
(0, 5) 1
(1, 1) 1
(1, 9) 1
(1, 12) 1
(2, 10) 1
(2, 7) 1
(2, 0) 1
</code></pre>