如何打印gensim中的LDA主题模型?Python

2024-04-25 05:10:34 发布

您现在位置:Python中文网/ 问答频道 /正文

使用gensim我能够从LSA中的一组文档中提取主题,但是如何访问从LDA模型生成的主题?

当打印lda.print_topics(10)时,代码给出了以下错误,因为print_topics()返回了NoneType

Traceback (most recent call last):
  File "/home/alvas/workspace/XLINGTOP/xlingtop.py", line 93, in <module>
    for top in lda.print_topics(2):
TypeError: 'NoneType' object is not iterable

代码:

from gensim import corpora, models, similarities
from gensim.models import hdpmodel, ldamodel
from itertools import izip

documents = ["Human machine interface for lab abc computer applications",
              "A survey of user opinion of computer system response time",
              "The EPS user interface management system",
              "System and human system engineering testing of EPS",
              "Relation of user perceived response time to error measurement",
              "The generation of random binary unordered trees",
              "The intersection graph of paths in trees",
              "Graph minors IV Widths of trees and well quasi ordering",
              "Graph minors A survey"]

# remove common words and tokenize
stoplist = set('for a of the and to in'.split())
texts = [[word for word in document.lower().split() if word not in stoplist]
         for document in documents]

# remove words that appear only once
all_tokens = sum(texts, [])
tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1)
texts = [[word for word in text if word not in tokens_once]
         for text in texts]

dictionary = corpora.Dictionary(texts)
corpus = [dictionary.doc2bow(text) for text in texts]

# I can print out the topics for LSA
lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2)
corpus_lsi = lsi[corpus]

for l,t in izip(corpus_lsi,corpus):
  print l,"#",t
print
for top in lsi.print_topics(2):
  print top

# I can print out the documents and which is the most probable topics for each doc.
lda = ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50)
corpus_lda = lda[corpus]

for l,t in izip(corpus_lda,corpus):
  print l,"#",t
print

# But I am unable to print out the topics, how should i do it?
for top in lda.print_topics(10):
  print top

Tags: andofthetextinfortopcorpus
3条回答

我认为节目主题的语法随着时间的推移而改变:

show_topics(num_topics=10, num_words=10, log=False, formatted=True)

对于num_topics主题数,返回num_words最重要的单词(默认情况下,每个主题10个单词)。

主题将作为列表返回-如果格式化为True,则返回字符串列表;如果为False,则返回(概率,word)2元组列表。

如果log为True,也将此结果输出到log。

与LSA不同,LDA中的主题之间没有自然的顺序。因此,返回的所有主题的num-topics<;=self.num-topics子集是任意的,并且可能在两次LDA训练运行之间发生变化。

经过一番折腾,似乎print_topics(numoftopics)因为ldamodel有一些bug。所以我的解决方法是使用print_topic(topicid)

>>> print lda.print_topics()
None
>>> for i in range(0, lda.num_topics-1):
>>>  print lda.print_topic(i)
0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system
...

你在用日志吗?print_topicsdocs中的说明打印到日志文件。

正如@mac389所说,lda.show_topics()是打印到屏幕的方式。

相关问题 更多 >