对于文本文件中的每个单词,提取大约5个单词

2024-05-23 22:38:26 发布

您现在位置:Python中文网/ 问答频道 /正文

对于某个单词的每一次出现,我需要通过在单词出现之前和之后显示大约5个单词来显示上下文。在

输入occurs('stranger', 'movie.txt')时,文本内容文件中单词“陌生人”的输出示例:

目前我的代码:

def occurs(word, filename):

    infile = open(filename,'r')
    lines = infile.read().splitlines()
    infile.close()

    wordsString = ''.join(lines)
    words = wordsString.split()
    print(words)

    for i in range(len(words)):
        if words[i].find(word):
            #stuck here

Tags: 文件文本txt内容moviefilename单词infile
3条回答

这将检索words中出现的每个单词的索引,该索引是文件中所有单词的列表。然后使用切片来获得匹配单词和前后5个单词的列表。在

def occurs(word, filename):
    infile = open(filename,'r')
    lines = infile.read().splitlines()
    infile.close()

    wordsString = ''.join(lines)
    words = wordsString.split()

    matches = [i for i, w in enumerate(words) if w.lower().find(word) != -1]

    for m in matches:
        l = " ".join(words[m-5:m+6])
        print(f"... {l} ...")

考虑一下^{}工具。在

给定

import more_itertools as mit


s = """\
But we did not answer him, for he was a stranger and we were not used to, strangers and were shy of them.
We were simple folk, in our village, and when a stranger was a pleasant person we were soon friends.
"""

word, distance = "stranger", 5
words = s.splitlines()[0].split()

演示

^{pr2}$

详细信息

more_itertools.adjacent返回元组的iterable,例如(bool,item)对。字符串中满足谓词的单词返回True布尔值。示例:

>>> neighbors
[(False, 'But'),
 ...
 (True, 'a'),
 (True, 'stranger'),
 (True, 'and'),
 ...
 (False, 'to,')]

从目标词的给定结果中筛选出相邻词。在

注意:^{}是第三方库。由^{}安装。

我建议根据i进行切片words

print(words[i-5:i+6])

(这将是你评论的地方)

或者,如示例所示打印:

^{pr2}$

为了解释这个词出现在前5位:

if i > 5:
    print("...", " ".join(words[i-5:i+6]), "...")
else:
    print("...", " ".join(words[0:i+6]), "...")

另外,find并没有做你认为的那样。如果find()找不到字符串,则返回-1,该值在If语句中使用时计算结果为True。尝试:

if word in words[i].lower():

相关问题 更多 >