在Python中,在进行特定单词情感分析之前,如何计算一个否定词或肯定词?

2024-04-26 17:41:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我在计算一个列表中的否定词出现在一个特定词之前的次数。例如,“ThisTerrableLaptop”。指定的单词是“laptop”,我希望在Python中输出“可怕的1”。在

def run(path):
    negWords={} #dictionary to return the count
    #load the negative lexicon
    negLex=loadLexicon('negative-words.txt')
    fin=open(path)

    for line in fin: #for every line in the file (1 review per line)
        line=line.lower().strip().split(' ')
        review_set=set() #Adding all the words in the review to a set

        for word in line: #Check if the word is present in the line
            review_set.add(word)  #As it is a set, only adds one time

        for word in review_set:
            if word in negLex:
                if word in negWords:
                    negWords[word]=negWords[word]+1
                else:
                    negWords[word] = 1

    fin.close()
    return negWords

if __name__ == "__main__": 
    print(run('textfile'))

Tags: thetopathruninforreturnif
2条回答

这应该符合您的要求,它使用set&intersection来避免一些循环。步骤是-

  1. 把消极的词写在一行
  2. 检查每个单词的位置
  3. 如果那个位置后面的单词是“笔记本电脑”,记录下来

注意,这只会识别一行中第一个出现的否定词,所以“可怕的笔记本电脑”将不匹配。在

from collections import defaultdict

def run(path):

    negWords=defaultdict(int)  # A defaultdict(int) will start at 0, can just add.

    #load the negative lexicon
    negLex=loadLexicon('negative-words.txt')
    # ?? Is the above a list or a set, if it's a list convert to set
    negLex = set(negLex)

    fin=open(path)

    for line in fin: #for every line in the file (1 review per line)
        line=line.lower().strip().split(' ')

        # Can just pass a list to set to make a set of it's items.
        review_set = set(line)

        # Compare the review set against the neglex set. We want words that are in
        # *both* sets, so we can use intersection.
        neg_words_used = review_set & negLex

        # Is the bad word followed by the word laptop?            
        for word in neg_words_used:
            # Find the word in the line list
            ix = line.index(word)
            if ix > len(line) - 2:
                # Can't have laptop after it, it's the last word.
                continue

            # The word after this index in the line is laptop.
            if line[ix+1] == 'laptop':
                negWords[word] += 1

    fin.close()
    return negWords

如果你只对单词“laptop”前面的单词感兴趣,一个更明智的方法是查找单词“laptop”,然后检查前面的单词,看看它是否是一个否定词。下面的例子就是这样做的。在

  1. 在当前行查找笔记本电脑
  2. 如果laptop不在队列中,或者是第一个单词,跳过该行
  3. 在笔记本电脑前读单词,对照否定单词
  4. 如果你有匹配的,把它加到我们的结果里

这样可以避免查找与笔记本电脑无关的单词。在

^{pr2}$

看起来你想对照连续单词检查函数,这里有一种方法,condition将针对每个连续单词进行检查。在

text = 'Do you like bananas? Not only do I like bananas, I love bananas!'
trigger_words = {'bananas'}
positive_words = {'like', 'love'}

def condition(w):
    return w[0] in positive_words and w[1] in trigger_words

for c in '.,?!':
    text = text.replace(c, '')

words = text.lower().split()

matches = filter(condition, zip(words, words[1:]))
n_positives = 0
for w1, w2 in matches:
    print(f'{w1.upper()} {w2} => That\'s positive !')
    n_positives += 1
print(f'This text had a score of {n_positives}')

输出:

^{pr2}$

奖金:

  1. 只需使用检查3个单词的条件将zip(w, w[1:])更改为{},就可以搜索3个连续的单词。

  2. 您可以通过执行以下操作获得计数器词典:

from collections import Counter
counter = Counter((i[0] for i in matches)) # counter = {'like': 2, 'love': 1}

相关问题 更多 >