使用Spacy的基于模式的标点符号

import spacy, en_core_web_sm from spacy.matcher import Matcher # Read input file nlp = spacy.load('en_core_web_sm') matcher = Matcher(nlp.vocab) Punctuation_patterns = [[{'POS': 'NOUN'},{'POS': 'NOUN'},{'POS': 'NOUN'}], ] matcher.add('PUNCTUATION', None, *Punctuation_patterns) doc = nlp("The cat cat cat sat on the mat. The dog sat on the mat.") matches = matcher(doc) spans = [] for match_id, start, end in matches: span = doc[start:end] # the matched slice of the doc spans.append({'start': span.start_char, 'end': span.end_char}) layer1 = (' '.join(['"{}"'.format(span.text)if token.dep_ == 'ROOT' else '{}'.format(token) for token in doc])) print (layer1)

for match_id, start, end in matches: span = doc[start:end] # the matched slice of the doc spans.append({'start': span.start_char, 'end': span.end_char}) result = doc.text for match_id, start, end in matches: span = doc[start:end] result = result.replace(span.text, f'"{span.text}"', 1) print (result)

1条回答

网友

1楼 · 发布于 2024-05-26 11:54:24

你可以用

result = doc.text
for match_id, start, end in matches:
    span = doc[start:end]
    result = result.replace(span.text, f'"{span.text}"', 1)
print (result)

也就是说，定义一个变量以保留结果result，并用doc.text值赋值。然后，检查匹配项，并将每个匹配的跨距替换为相同的跨距文本（用双引号括起来）

相关问题更多 >

编程相关推荐

热门问题

热门文章