如何查找和计算列表和文本之间的多个交叉点？

#counters to zero lines, blanklines, sentences, words = 0, 0, 0, 0 print ('-' * 50) while True: try: #def text file filename = input("Please enter filename: ") textf = open(filename, 'r') break except IOError: print( 'Cannot open file "%s" ' % filename ) #reads one line at a time for line in textf: print( line, ) # test lines += 1 if line.startswith('\n'): blanklines += 1 else: #sentence ends with . or ! or ? #count these characters sentences += line.count('.') + line.count('!') + line.count('?') #create a list of words #use None to split at any whitespace regardless of length tempwords = line.split(None) print(tempwords) #total words words += len(tempwords) #anglicisms words1 = set(open(filename).read().split()) words2 = set(open("anglicisms.txt").read().split()) duplicates = words1.intersection(words2) textf.close() print( '-' * 50) print( "Lines : ", lines) print( "Blank lines : ", blanklines) print( "Sentences : ", sentences) print( "Words : ", words) print( "Anglicisms : %d:%s"%(len(duplicates),duplicates))

2条回答

网友

1楼 · 编辑于 2024-04-26 10:01:13

我会这样做：

from collections import Counter
anglicisms = open("anglicisms.txt").read().split()

matches = []
for line in textf:
    matches.extend([word for word in line.split() if word in anglicisms])

anglicismsInText = Counter(matches)

关于第二个问题，我觉得有点难。以你的例子来说，“big”是一种英语，而“bigfoot”应该匹配，但是“Abigail”呢？还是“过大”？每次在字符串中发现英语时，它是否应该匹配？一开始？最后？一旦知道了这一点，就应该构建一个与之匹配的正则表达式

编辑：要匹配以英语开头的字符串，请执行以下操作：

def derivatesFromAnglicism(word):
    return any([word.startswith(a) for a in anglicism])

matches.extend([word for word in line.split() if derivatesFromAnglicism(word)])

网友

2楼 · 编辑于 2024-04-26 10:01:13

这将解决您的第一个问题：

anglicisms = ["a", "b", "c"]
words = ["b", "b", "b", "a", "a", "b", "c", "a", "b", "c", "c", "c", "c"]

results = map(lambda angli: (angli, words.count(angli)), anglicisms)
results.sort(key=lambda p:-p[1])

结果如下：

[('b', 5), ('c', 5), ('a', 3)]

对于第二个问题，我认为正确的方法是使用正则表达式。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章