我的WordCloud缺少单词末尾的字母“s”

2024-06-10 11:25:19 发布

您现在位置:Python中文网/ 问答频道 /正文

起初,我认为问题在于我的数据,我在清理数据时犯了一个错误。但是我检查了一下,情况并非如此

我正在使用以下代码:

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)

plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()

现在,我的wordcloud显示了诸如“coronaviru”、“viru”、“crisi”之类的词。与collocations=True一起,它显示了完整的词以及诸如“冠状病毒病例”、“冠状病毒大流行”之类的词。 有人知道如何解决这个问题吗? 就像我说的,我检查了数据,结果总是正确的。所以我猜这个错误发生在wordcloud上

我的数据如下所示:

    created_at                        id                full_text
0   Sat Aug 01 00:25:53 +0000 2020    28934685093219    life is hard with coronavirus
1   Sat Aug 01 00:25:53 +0000 2020    28934685093219    coronavirus sucks

Tags: 数据texttrue错误pltsataugfull
2条回答

你做错了,你的代码对我有用:

import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud

array = {'full_text': ['life is hard with coronavirus', 'coronavirus sucks']}
df = pd.DataFrame(array)

plt.style.use('fivethirtyeight')
allWords = ' '.join([twts for twts in df['full_text']])
wordCloud = WordCloud(collocations=True, width = 1000,
height=600, random_state = 21, max_font_size = 120).generate(allWords)

plt.imshow(wordCloud, interpolation = "bilinear")
plt.axis('off')
plt.show()

这是输出:

enter image description here

您需要更改WordCloud函数中的一个参数:normalize\u plurals=False。 参考:https://amueller.github.io/word_cloud/generated/wordcloud.WordCloud.html

normalize_plurals: bool, default=True. Whether to remove trailing ‘s’ from words. If True and a word appears with and without a trailing ‘s’, the one with trailing ‘s’ is removed and its counts are added to the version without trailing ‘s’ – unless the word ends with ‘ss’. Ignored if using generate_from_frequencies.

相关问题 更多 >