陈腔滥调

1条回答

网友

1楼 · 发布于 2024-06-16 16:03:20

您正在寻找Spacy的matcher，您可以阅读关于here的更多信息。它可以为您找到任意长/复杂的令牌序列，并且您可以轻松地将其并行化（参见pipe（）的matcher文档）。它默认返回文本中匹配项的位置，尽管您可以使用找到的标记执行任何操作，还可以添加一个on_match回调函数。在

也就是说，我认为你的用例相当简单。我已经包括了一个例子，让你开始。在

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en')

cliches = ['Abandon ship',
'About face',
'Above board',
'All ears']

cliche_patterns = [[{'LOWER':token.text.lower()} for token in nlp(cliche)] for cliche in cliches]

matcher = Matcher(nlp.vocab)
for counter, pattern in enumerate(cliche_patterns):
    matcher.add("Cliche "+str(counter), None, pattern)

example_1 = nlp("Turn about face!")
example_2 = nlp("We must abandon ship! It's the only way to stay above board.")

matches_1 = matcher(example_1)
matches_2 = matcher(example_2)

for match in matches_1:
    print(example_1[match[1]:match[2]])

print("    ")
for match in matches_2:
    print(example_2[match[1]:match[2]])

>>> about face
>>>     
>>> abandon ship
>>> above board

只需确保您有Spacy（2.0.0+）的最新版本，因为matcher API最近发生了更改。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

陈腔滥调

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >