我正在使用下面的Python
代码来count words in text (.txt) files
,检查文本文件中的任何单词是否属于{
import re
import collections
from collections import Counter
import csv
import sys
find_words = re.compile(r'(?<!\S)[A-Za-z]+(?!\S)').findall
wanted1 = set(find_words(open('word_list1.csv').read().lower()))
wanted2 = set(find_words(open('word_list2.csv').read().lower()))
for f in sys.argv[1:]:
cnt1 = cnt2 = cntWords = 0
WANTED = 20
with open(f) as inputfile:
for line in inputfile:
for word in find_words(line.lower()):
myfile.write(word+ "\n")
cntWords += 1
if word in wanted1:
file1.write(word+ "\n")
cnt1 += 1
if word in wanted2:
file2.write(word+ "\n")
cnt2 += 1
At the moment, I am counting every word in the .txt file
发生在belong in the word lists wanted1 and wanted2.
我想数数这些单词only when there is no negator in a distance of three words from these words.
否定词是any one of the following three words: no, not, never.
在这种情况下,if a negator is in the distance [-3,+3] words from the word I am examining, the word should not be counted even if it belongs in one of the word lists I am examining.
你知道如何在我的代码中实现这一点吗?谢谢。在
示例1:
{counter Word列表中的单词{0,如果单词不属于单词6},那么它们应该属于单词cd6}。它可以是“从不”或“不”。在
示例2:
never Word-2 Word-1 Word0 Word1 Word2
->;Word-2 Word-1 Word0不应计数,Word1 Word2应计数(如果它们属于csv单词列表中)。它可以是“不”或“不”。在
我写了一个小脚本来做一些类似于你要求的事情。我将.txt文件的内容实现为一个多行字符串,并对单词列表进行硬编码,以简化本例中的内容。可以用文件打开/读取代码替换这些位。这可能是一个非常低效的解决方案,但这是在我头脑中组织起来的最清晰的方法。你可以随心所欲地进行优化。在
结果如下:
^{pr2}$相关问题 更多 >
编程相关推荐