如何打印gensim中的LDA主题模型？Python

from gensim import corpora, models, similarities from gensim.models import hdpmodel, ldamodel from itertools import izip documents = ["Human machine interface for lab abc computer applications", "A survey of user opinion of computer system response time", "The EPS user interface management system", "System and human system engineering testing of EPS", "Relation of user perceived response time to error measurement", "The generation of random binary unordered trees", "The intersection graph of paths in trees", "Graph minors IV Widths of trees and well quasi ordering", "Graph minors A survey"] # remove common words and tokenize stoplist = set('for a of the and to in'.split()) texts = [[word for word in document.lower().split() if word not in stoplist] for document in documents] # remove words that appear only once all_tokens = sum(texts, []) tokens_once = set(word for word in set(all_tokens) if all_tokens.count(word) == 1) texts = [[word for word in text if word not in tokens_once] for text in texts] dictionary = corpora.Dictionary(texts) corpus = [dictionary.doc2bow(text) for text in texts] # I can print out the topics for LSA lsi = models.LsiModel(corpus_tfidf, id2word=dictionary, num_topics=2) corpus_lsi = lsi[corpus] for l,t in izip(corpus_lsi,corpus): print l,"#",t print for top in lsi.print_topics(2): print top # I can print out the documents and which is the most probable topics for each doc. lda = ldamodel.LdaModel(corpus, id2word=dictionary, num_topics=50) corpus_lda = lda[corpus] for l,t in izip(corpus_lda,corpus): print l,"#",t print # But I am unable to print out the topics, how should i do it? for top in lda.print_topics(10): print top

3条回答

网友

1楼 · 编辑于 2024-04-25 05:10:34

我认为节目主题的语法随着时间的推移而改变：

show_topics(num_topics=10, num_words=10, log=False, formatted=True)

对于num_topics主题数，返回num_words最重要的单词（默认情况下，每个主题10个单词）。

主题将作为列表返回-如果格式化为True，则返回字符串列表；如果为False，则返回（概率，word）2元组列表。

如果log为True，也将此结果输出到log。

与LSA不同，LDA中的主题之间没有自然的顺序。因此，返回的所有主题的num-topics<；=self.num-topics子集是任意的，并且可能在两次LDA训练运行之间发生变化。

网友

2楼 · 编辑于 2024-04-25 05:10:34

经过一番折腾，似乎print_topics(numoftopics)因为ldamodel有一些bug。所以我的解决方法是使用print_topic(topicid)：

>>> print lda.print_topics()
None
>>> for i in range(0, lda.num_topics-1):
>>>  print lda.print_topic(i)
0.083*response + 0.083*interface + 0.083*time + 0.083*human + 0.083*user + 0.083*survey + 0.083*computer + 0.083*eps + 0.083*trees + 0.083*system
...

网友

3楼 · 编辑于 2024-04-25 05:10:34

你在用日志吗？print_topics按docs中的说明打印到日志文件。

正如@mac389所说，lda.show_topics()是打印到屏幕的方式。

相关问题更多 >

编程相关推荐

热门问题

热门文章