<h2>TL;DR</h2>
<p><code>demo_liu_hu_lexicon</code>函数是演示如何使用<code>opinion_lexicon</code>的函数。用于测试,不应直接使用。在</p>
<hr/>
<h2>很长时间内</h2>
<p>让我们看看这个函数,看看如何重新创建一个类似的函数<a href="https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616" rel="nofollow noreferrer">https://github.com/nltk/nltk/blob/develop/nltk/sentiment/util.py#L616</a></p>
<pre><code>def demo_liu_hu_lexicon(sentence, plot=False):
"""
Basic example of sentiment classification using Liu and Hu opinion lexicon.
This function simply counts the number of positive, negative and neutral words
in the sentence and classifies it depending on which polarity is more represented.
Words that do not appear in the lexicon are considered as neutral.
:param sentence: a sentence whose polarity has to be classified.
:param plot: if True, plot a visual representation of the sentence polarity.
"""
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
tokenizer = treebank.TreebankWordTokenizer()
</code></pre>
<p>好吧,导入存在于函数内部是一个奇怪的用法,但这是因为它是一个用于简单测试或文档的演示函数。在</p>
<p>而且,<code>treebank.TreebankWordTokenizer()</code>的用法相当奇怪,我们可以简单地使用<code>nltk.word_tokenize</code>。在</p>
<p>让我们移出导入并将<code>demo_liu_hu_lexicon</code>重写为<code>simple_sentiment</code>函数。在</p>
^{pr2}$
<p>接下来,我们看看</p>
^{3}$
<p>功能</p>
<ol>
<li>第一个标记化的和小写的句子</li>
<li>初始化正负字数。在</li>
<li><code>x</code>和<code>y</code>为以后的一些绘图而初始化,所以我们忽略它。在</li>
</ol>
<p>如果我们进一步深入函数:</p>
<pre><code>def demo_liu_hu_lexicon(sentence, plot=False):
from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
tokenizer = treebank.TreebankWordTokenizer()
pos_words = 0
neg_words = 0
tokenized_sent = [word.lower() for word in tokenizer.tokenize(sentence)]
x = list(range(len(tokenized_sent))) # x axis for the plot
y = []
for word in tokenized_sent:
if word in opinion_lexicon.positive():
pos_words += 1
y.append(1) # positive
elif word in opinion_lexicon.negative():
neg_words += 1
y.append(-1) # negative
else:
y.append(0) # neutral
if pos_words > neg_words:
print('Positive')
elif pos_words < neg_words:
print('Negative')
elif pos_words == neg_words:
print('Neutral')
</code></pre>
<ol start=“4”>
<li><p>循环只需遍历每个标记并检查单词是否在正/负词典中。</p></li>
<li><p>最后,它检查正负字数并返回标记。</p></li>
</ol>
<p>现在让我们看看我们是否可以有一个更好的<code>simple_sentiment</code>函数,现在我们知道了<code>demo_liu_hu_lexicon</code>的作用。在</p>
<p>无法避免步骤1中的标记化,因此我们有:</p>
<pre><code>from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
def simple_sentiment(text):
tokens = [word.lower() for word in word_tokenize(text)]
</code></pre>
<p>第2-5步有一个懒散的方法,就是复制并粘贴并更改<code>print()</code>->;<code>return</code></p>
<pre><code>from nltk.corpus import opinion_lexicon
from nltk.tokenize import treebank
def simple_sentiment(text):
tokens = [word.lower() for word in word_tokenize(text)]
for word in tokenized_sent:
if word in opinion_lexicon.positive():
pos_words += 1
y.append(1) # positive
elif word in opinion_lexicon.negative():
neg_words += 1
y.append(-1) # negative
else:
y.append(0) # neutral
if pos_words > neg_words:
return 'Positive'
elif pos_words < neg_words:
return 'Negative'
elif pos_words == neg_words:
return 'Neutral'
</code></pre>
<p>现在,你有一个功能,你可以做任何你想做的事。在</p>
<hr/>
<p>顺便说一句,这个演示真的很奇怪。。在</p>
<p>当我们看到一个正的单词加1,当我们看到一个否定的单词时,我们加<code>-1</code>。
当<code>pos_words > neg_words</code>时,我们说某些东西是正的。在</p>
<p>这意味着整数列表的比较遵循一些可能没有语言或数学逻辑的python序列比较(参见<a href="https://stackoverflow.com/questions/47342100/what-happens-when-we-compare-list-of-integers?noredirect=1#comment81636371_47342100">What happens when we compare list of integers?</a>)</p>