Python 在关键词周围截断文本

1 投票

4 回答

1306 浏览

提问于 2025-04-16 07:50

我有一段文字，想要在里面找一个关键词或短语，然后只返回这个关键词或短语前后的部分内容。就像谷歌那样，正好能做到这一点。

这是我从网上找到的一段文字：

"这个过滤器会截断像原来的Django过滤器那样的单词，但它不是根据单词的数量来截断，而是根据字符的数量。我在建立一个网站时发现了这个需求，因为我需要在非常小的文本框上显示标签，而按单词截断并不总是能给我最好的结果（而按字符截断就……嗯……看起来不太优雅）。"

假设我想在这段文字中搜索短语 building a website，然后输出的内容可能是这样的：

"... 这个需求是在 building a website 时产生的，我需要显示 ..."

补充说明：我应该更清楚地说明一下。这需要适用于多个字符串/短语，而不仅仅是这一段。

文本处理网站开发数据过滤文本截断关键词提取标签显示字符过滤内容展示

4 个回答

>>> re.search(r'((?:\S+\s+){,5}\bbuilding a website\b(?:\s+\S+){,5})', s).groups()
("the need for this when building a website where i'd have to show",)

当然可以！请把你想要翻译的内容发给我，我会帮你用简单易懂的语言解释清楚。

回答于 2025-04-16 由 Python大师

分享举报

使用一种方法来找到你想要的短语的索引位置，然后从这个位置向前和向后各切出N个字符的内容。你还可以更聪明一点，去找离这个索引位置N个字符远的最近的空格，这样就能得到完整的单词。

这里有一些Python字符串函数，可以帮助你找到你需要的内容：

http://docs.python.org/py3k/library/strings.html

回答于 2025-04-16 由 Python大师

分享举报

在其他人的回答基础上（特别是cababunga的），我想要一个函数，这个函数可以接受最多25个字符（或者更多），并且在最后一个单词的边界处停止，这样可以提供一个很好的匹配结果：

import re

def find_with_context(haystack, needle, context_length, escape=True):
    if escape:
        needle = re.escape(needle)
    return re.findall(r'\b(.{,%d})\b(%s)\b(.{,%d})\b' % (context_length, needle, context_length), haystack)

# Returns a list of three-tuples, (context before, match, context after).

用法：

>>> find_with_context(s, 'building a website', 25)
[(' the need for this when ', 'building a website', " where i'd have to show ")]
>>> # Compare this to what it would be without making sure it ends at word boundaries:
... # [('d the need for this when ', 'building a website', " where i'd have to show l")]
...
>>> for match in find_with_context(s, 'building a website', 25):
...     print '<p>...%s<strong>%s</strong>%s...</p>' % match
... 
<p>... the need for this when <strong>building a website</strong> where i'd have to show ...</p>

回答于 2025-04-16 由 Python大师

分享举报

Python 在关键词周围截断文本

4 个回答

撰写回答