Python:wordcloud,重复单词

2024-05-15 22:27:04 发布

您现在位置:Python中文网/ 问答频道 /正文

在“云”一词中,我有重复的词,我不明白为什么它们不被计算在一起,然后显示为一个词。

from wordcloud import WordCloud
word_string = 'oh oh oh oh oh oh verse wrote book stand title book would life superman thats make feel count privilege love ideal honored know feel see everyday things things say rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock love rock oh oh oh verse try count ways make smile id run fingers run timeless things talk sugar keeps going make wanna keep lovin strong make wanna try best give want need give whole heart little piece minimum talking everything single wish talking every dream rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock wanna rock bridge theres options dont want theyre worth time cause oh thank like us fine rock sand smile cry joy pain truth lies matter know count oh oh oh oh oh oh rock baby truth rock love rock rock everything need rock baby rock wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock love rock oh oh oh oh oh oh wanna kiss ya feel ya please ya right wanna touch ya love ya baby night reward ya things rock love rock wanna rock party people people party popping sitting around see looking looking see look started lets hook little one one come give stuff let freshin ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture party people people party popping sitting around see looking looking see look started lets hook come one give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture lets hook come give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go lets hook come give stuff let freshin little one one ruff lets go lets hook start wont stop baby baby dont stop come give stuff lets go black culture black culture black culture black culture black culture black culture black culture black culture'
wordcloud = WordCloud(background_color="white",
                          width=1200, height=1000,
                          stopwords=STOPWORDS
                         ).generate(word_string)
plt.imshow(wordcloud)

正如你所看到的,像爱,哦,摇滚,黑人,文化出现了好几次,似乎他们不算在一起。我做错什么了?

enter image description here


Tags: gohookonebabyblackohrockstuff
4条回答

如果你看一下wordcloud.words_,你会发现频率表包含了一些两个词的短语,比如“哦哦”、“hook start”、“let go”、“let hook”。

您需要深入研究.process_text()背后的代码,以了解它为什么这样做。

作为一个解决方案,您可以分割word_string自己来构建单词频率表,然后使用.generate_from_frequencies()来创建图像。

这是wordúcloud项目中的一个叫做“搭配”的特性。您可以通过设置collocations=False来关闭它,如下所示:

    wordcloud = WordCloud(collocations=False).generate(word_string)

这样可以去除文本中经常组合在一起的单词。它会去掉一些你可能不喜欢的东西,比如“哦哦”,它会去掉一些你可能喜欢的东西,比如“黑人文化”

相关问题 更多 >