使用Python进行文本搜索

2 投票

1 回答

1117 浏览

提问于 2025-04-18 14:06

我正在做一个文本搜索的项目，使用一个叫做TextBlob的工具来从文本中搜索句子。TextBlob可以很有效地找到包含关键词的所有句子。不过，为了更好地研究，我还想提取出关键词前面和后面各一个句子，但我还没有找到办法。

下面是我正在使用的代码：

def extraxt_sents(Text,word):
    search_words = set(word.split(','))
        sents = ''.join([s.lower() for s in Text])
        blob = TextBlob(sents)
    matches = [str(s) for s in blob.sentences if search_words & set(s.words)]
    print search_words
    print(matches)

文本处理文本搜索自然语言处理关键词提取 textblob

1 个回答

如果你想获取匹配项前后的行，可以选择创建一个循环并记住前一行，或者使用切片，就像在blob.sentences列表上使用[from:to]那样。

最好的方法可能是使用enumerate这个内置函数。

match_region = [map(str, blob.sentences[i-1:i+2])     # from prev to after next
                for i, s in enumerate(blob.sentences) # i is index, e is element
                if search_words & set(s.words)]       # same as your condition

在这里，blob.sentences[i-1:i+2]会提取一个子列表，从索引i-1（包含）到索引i+2（不包含）。而map则会把这个列表中的元素转换成字符串。

注意：其实，你可能想把i-1替换成max(0, i-1)；否则i-1可能会变成-1，这样Python会把它当成最后一个元素，导致得到一个空的切片。另一方面，如果i+2超过了列表的长度，这就不会是个问题。

回答于 2025-04-18 由 Python大师

分享举报

使用Python进行文本搜索

1 个回答

撰写回答