使用gensim的短语获取三元组时出错

2024-06-17 10:15:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我想提取出给定句子的所有二元曲线和三元曲线。在

from gensim.models import Phrases
documents = ["the mayor of new york was there", "Human Computer Interaction is a great and new subject", "machine learning can be useful sometimes","new york mayor was present", "I love machine learning because it is a new subject area", "human computer interaction helps people to get user friendly applications"]

sentence_stream = [doc.split(" ") for doc in documents]
bigram = Phrases(sentence_stream, min_count=1, threshold=2, delimiter=b' ')
trigram = Phrases(bigram(sentence_stream, min_count=1, threshold=2, delimiter=b' '))

for sent in sentence_stream:
    #print(sent)
    bigrams_ = bigram[sent]
    trigrams_ = trigram[bigrams_]

    print(bigrams_)
    print(trigrams_)

代码对bigrams和捕获“纽约”和“机器学习”广告大图都很好。在

但是,当我尝试插入三元函数时,会出现以下错误。在

^{pr2}$

请告诉我,如何更正我的代码。在

我在跟踪gensim的example documentation。在


Tags: newstreamis曲线sentencedocumentssentprint
1条回答
网友
1楼 · 发布于 2024-06-17 10:15:21

根据docs,您可以:

from gensim.models import Phrases
from gensim.models.phrases import Phraser 

phrases = Phrases(sentence_stream)
bigram = Phraser(phrases)
trigram = Phrases(bigram[sentence_stream])

作为bigram对象的Phrases不能再次调用,因为您正在这样做。在

相关问题 更多 >