<p>既然您提供了<code>letters_only</code>和<code>remove_punctuation</code>的定义,我们现在可以说您的代码相当于:</p>
<pre><code>[lemmatizer.lemmatize(word.lower())
for word in doc.split()
if letters_only(word) and word.lower() not in all_names_lower]
</code></pre>
<p>所以所有对<code>remove_punctuation</code>的调用都是无用的,因为只有在<code>letters_only(word)</code>这意味着<code>word</code>没有任何标点符号的情况下才进行调用。你知道吗</p>
<hr/>
<p>不是真的。最好是<code>zip</code>将原始列表与删除标点符号的生成器结合在一起:</p>
<pre><code>original_words = doc.split()
no_punct_words = map(remove_punctuation, original_words)
cleaned_docs.append(' '.join([lemmatizer.lemmatize(no_punct_word.lower())
for word, no_punct_word in zip(original_words, no_punct_words) if letters_only(word)
and no_punct_word not in all_names
and no_punct_word not in all_names_lower]))
</code></pre>
<p>不管怎样,你的条件没有多大意义。如果<code>if letters_only(word)</code>条件为真,我希望<code>remove_punctuation</code>对<code>word</code>什么也不做,这样您就可以删除它。你知道吗</p>
<p>还有:两个条件:</p>
<pre><code>no_punct_word not in all_names and no_punct_word not in all_names_lower
</code></pre>
<p>可能会变成:</p>
<pre><code>no_punct_word.lower() not in all_names_lower
</code></pre>
<hr/>
<p>顺便说一句:如果您要应用的条件应该始终应用于<code>remove_punctuation(word)</code>,那么您可以做得更好:您可以<code>map</code>该函数:</p>
<pre><code>no_punct_words = map(remove_punctuation, doc.split())
# ...
[lemmatizer.lemmatize(word.lower())
for word in no_punct_words if letters_only(word)
and word.lower() not in all_names_lower]
</code></pre>
<p>也许你可以用<code>.lower()</code>做同样的事情:</p>
<pre><code>lower_no_punct_words = map(str.lower, map(remove_punctuation, doc.split()))
# ...
[lemmatizer.lemmatize(word)
for word in lower_no_punct_words if letters_only(word)
and word not in all_names_lower]
</code></pre>