在Python中，在进行特定单词情感分析之前，如何计算一个否定词或肯定词？

def run(path): negWords={} #dictionary to return the count #load the negative lexicon negLex=loadLexicon('negative-words.txt') fin=open(path) for line in fin: #for every line in the file (1 review per line) line=line.lower().strip().split(' ') review_set=set() #Adding all the words in the review to a set for word in line: #Check if the word is present in the line review_set.add(word) #As it is a set, only adds one time for word in review_set: if word in negLex: if word in negWords: negWords[word]=negWords[word]+1 else: negWords[word] = 1 fin.close() return negWords if __name__ == "__main__": print(run('textfile'))

2条回答

网友

1楼 · 编辑于 2024-04-26 17:41:30

这应该符合您的要求，它使用set&intersection来避免一些循环。步骤是-

把消极的词写在一行
检查每个单词的位置
如果那个位置后面的单词是“笔记本电脑”，记录下来

注意，这只会识别一行中第一个出现的否定词，所以“可怕的笔记本电脑”将不匹配。在

from collections import defaultdict

def run(path):

    negWords=defaultdict(int)  # A defaultdict(int) will start at 0, can just add.

    #load the negative lexicon
    negLex=loadLexicon('negative-words.txt')
    # ?? Is the above a list or a set, if it's a list convert to set
    negLex = set(negLex)

    fin=open(path)

    for line in fin: #for every line in the file (1 review per line)
        line=line.lower().strip().split(' ')

        # Can just pass a list to set to make a set of it's items.
        review_set = set(line)

        # Compare the review set against the neglex set. We want words that are in
        # *both* sets, so we can use intersection.
        neg_words_used = review_set & negLex

        # Is the bad word followed by the word laptop?            
        for word in neg_words_used:
            # Find the word in the line list
            ix = line.index(word)
            if ix > len(line) - 2:
                # Can't have laptop after it, it's the last word.
                continue

            # The word after this index in the line is laptop.
            if line[ix+1] == 'laptop':
                negWords[word] += 1

    fin.close()
    return negWords

如果你只对单词“laptop”前面的单词感兴趣，一个更明智的方法是查找单词“laptop”，然后检查前面的单词，看看它是否是一个否定词。下面的例子就是这样做的。在

在当前行查找笔记本电脑
如果laptop不在队列中，或者是第一个单词，跳过该行
在笔记本电脑前读单词，对照否定单词
如果你有匹配的，把它加到我们的结果里

这样可以避免查找与笔记本电脑无关的单词。在

^{pr2}$

网友

2楼 · 编辑于 2024-04-26 17:41:30

看起来你想对照连续单词检查函数，这里有一种方法，condition将针对每个连续单词进行检查。在

text = 'Do you like bananas? Not only do I like bananas, I love bananas!'
trigger_words = {'bananas'}
positive_words = {'like', 'love'}

def condition(w):
    return w[0] in positive_words and w[1] in trigger_words

for c in '.,?!':
    text = text.replace(c, '')

words = text.lower().split()

matches = filter(condition, zip(words, words[1:]))
n_positives = 0
for w1, w2 in matches:
    print(f'{w1.upper()} {w2} => That\'s positive !')
    n_positives += 1
print(f'This text had a score of {n_positives}')

输出：

^{pr2}$

奖金：

只需使用检查3个单词的条件将zip(w, w[1:])更改为{}，就可以搜索3个连续的单词。
您可以通过执行以下操作获得计数器词典：

from collections import Counter
counter = Counter((i[0] for i in matches)) # counter = {'like': 2, 'love': 1}

奖金：

相关问题更多 >

编程相关推荐

热门问题

热门文章