Python:从文本中提取主题及其相关短语

2024-05-23 14:32:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试跟随线程(How to extract subjects in a sentence and their respective dependent phrases?)。我还想从文本中提取主题及其依赖项

import spacy
from textpipeliner import PipelineEngine, Context
from textpipeliner.pipes import *

text = 'No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.'

pipes_structure = [
    SequencePipe([
        FindTokensPipe("VERB/nsubj/*"),
        NamedEntityFilterPipe(),
        NamedEntityExtractorPipe()
    ]),
    FindTokensPipe("VERB"),
    AnyPipe([
        SequencePipe([
            FindTokensPipe("VBD/dobj/NNP"),
            AggregatePipe([
                NamedEntityFilterPipe("GPE"),
                NamedEntityFilterPipe("PERSON")
            ]),
            NamedEntityExtractorPipe()
        ]),
        SequencePipe([
            FindTokensPipe("VBD/**/*/pobj/NNP"),
            AggregatePipe([
                NamedEntityFilterPipe("LOC"),
                NamedEntityFilterPipe("PERSON")
            ]),
            NamedEntityExtractorPipe()
        ])
    ])
]

engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
engine.process()

当我运行上述代码时,它抛出以下错误:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-22-5f5a5c9e8e51> in <module>()
----> 1 engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])
      2 engine.process()

~/anaconda3/lib/python3.6/site-packages/textpipeliner/context.py in __init__(self, doc)
      4         self._current_sent_idx = -1
      5         self._paragraph = self._sents[0:9]
----> 6         for s in doc.sents:
      7             self._sents.append(s)
      8         self.doc = doc

AttributeError: 'str' object has no attribute 'sents'

我不确定我在哪里犯了错误。有人能帮忙纠正这个问题吗


Tags: totextinimportselfdoccontextstructure
2条回答

有趣的图书馆

您的上下文需要是不同的对象。错误明确地说明了这一点。检查包官方example

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')

看起来您正在将字符串作为text变量传入此行

engine = PipelineEngine(pipes_structure, Context(text), [0, 1, 2])

将第4行替换为

nlp = spacy.load("en")
text = nlp('No Offline Maps! It used to have offline maps but they disappeared. It now has a menu option to watch a video in exchange for maps but it never downloads the map. Makes the app useless to me.')

因为这是他们在你引用的帖子中所做的

这样text就不是一个字符串,但它是nlp函数输出的任何类型,所以它在第二行到最后一行工作

相关问题 更多 >