Python：在同一管道中使用Gensim和Scikit

2024-04-26 18:35:16 发布

您现在位置：Python中文网/ 问答频道 /正文

8209

网友

男 | 程序猿一只，喜欢编程写python代码。

我想在同一管道中使用Gensim和Scikit。你知道吗

[更新] 语料库是从一个柠檬化标记列表创建的doc.tokens

bowlist = []
for doc in linked_doc_list:
    bowlist.append(doc.tokens)

dictionary = corpora.Dictionary(bowlist)
corpus = [dictionary.doc2bow(line) for line in bowlist]

这涉及到将Gensim语料库转换为numpy数组，如下所示：

 numpy_matrix = gensim.matutils.corpus2dense(package.corpus, num_terms=len(package.dict.token2id))

这似乎管用。SKLDA运行：

model = LatentDirichletAllocation(n_components=components,
                                          max_iter=maxiter,
                                          learning_method=learningmethod,
                                          learning_offset=learningoffset,
                                          random_state=randomstate,
                                          verbose=verbose).fit(numpy_matrix)

但是现在，为了阅读结果，我需要阅读gensim dict中的实际术语（否则我会被无意义的特征号所困扰）。你知道吗

但是，下面代码的结果显然毫无意义。你知道吗

 def filterAndReportResultsLDA(self, model, gensimdict, n_top_words=10):
     for topic_idx, topic in enumerate(model.components_):
         print("Topic %d:" % (topic_idx))
         words = []
         for i in topic.argsort()[:-n_top_words - 1:-1]:
            words.append(gensimdict[i])
         print(words)

示例结果是：

['reporting.', '7:23', 'users?', 'breaking', '5am', 'bell', 'c7n', 'content?', 'functions', 'vi']

有人能告诉我我做错了什么吗？你知道吗

Tags： in numpy for topic doc model dictionary components

0条回答

目前没有回答

Python：在同一管道中使用Gensim和Scikit

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python：在同一管道中使用Gensim和Scikit

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >