Coursera Python最终项目情感分类器

3条回答

网友

1楼 · 编辑于 2024-05-13 10:43:47

谢谢你更新你的问题。首先，我要定义程序的入口点，例如main。然后，只需进行初步的CSV（非常简单）解析即可。这只是打印有关CSV中每个条目的信息，以验证我们是否正确解析它：

def main():

    with open("project_twitter_data.csv", "r") as file:
        # Skip the first line
        next(file)
        for tweet, retweet_count, reply_count in map(lambda line: line.strip().split(","), file):
            print(f"tweet: {tweet[:20]}...\nretweet_count: {retweet_count}\nreply_count: {reply_count}\n")
        

if __name__ == "__main__":
    main()

输出：

tweet: @twitteruser: On now...
retweet_count: 3
reply_count: 0

tweet: BUNCH of things abou...
retweet_count: 1
reply_count: 0

>>>

我的CSV文件中只有两个条目，但它应该适用于任意数量的条目（只要推文中没有逗号）

然后，你需要加载你的积极和消极的话。我假设文件不是太大，所以你可以把所有单词都读入列表。有许多不同的方法可以计算每条推文的正面和负面词汇。例如，您可以将当前推文拆分为空白，以获得“单词”列表。我之所以说“单词”，是因为从技术上讲，这些字符串可能包含标点符号，所以您必须以某种方式将其考虑在内。另一种方法是使用带有单词边界的正则表达式模式从当前tweet生成单词列表。我在下面所做的只是在当前tweet中寻找一个子串，这有点幼稚。除非有一个适当的单元测试，故意寻找以确保没有使用这种方法，否则这应该足够好了

def main():

    with open("positive_words.txt", "r") as file:
        positive_words = file.read().splitlines()

    with open("negative_words.txt", "r") as file:
        negative_words = file.read().splitlines()

    with open("project_twitter_data.csv", "r") as file:
        # Skip the first line
        next(file)
        for tweet, retweet_count, reply_count in map(lambda line: line.strip().split(","), file):
            positive_count = sum(tweet.count(word) for word in positive_words)
            negative_count = sum(tweet.count(word) for word in negative_words)
            net_count = positive_count - negative_count
            # Write retweet_count, reply_count, positive_count, negative_count and net_count to resulting_data.csv
            
        

if __name__ == "__main__":
    main()

网友

2楼 · 编辑于 2024-05-13 10:43:47

这是我的代码，它为MAB的coursera项目工作

punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']

def strip_punctuation  (x):
    for i in punctuation_chars:
        x = x.replace(i, '')
    return x
def get_pos (x):
    x = strip_punctuation(x)
    y = x.lower().split()
    count = 0
    for i in y:
        if i in positive_words:
            count = count + 1
    return count
def get_neg (x):
    x = strip_punctuation(x)
    y = x.lower().split()
    count = 0
    for i in y:
        if i in negative_words:
            count = count + 1
    return count

positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())
negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())

outfile = open("resulting_data.csv", "w")
outfile.write('Number of Retweets, Number of Replies, Positive Score, Negative Score, Net Score')
outfile.write('\n')

myfile = open('project_twitter_data.csv', 'r')
rows = myfile.readlines()[1:]
for line in rows:
    words = line.split()
    numbers = words[-1]
    twrt = numbers.split(',')
    print ('retweets: ', twrt[1], 'replies: ', twrt[2])
    pos_sco = 0
    neg_sco = 0
    for word in words:
        if word in positive_words:
            pos_sco = pos_sco + 1
        if word in negative_words:
            neg_sco = neg_sco + 1
    net_sco = pos_sco - neg_sco
    print ('positive words: ', pos_sco, 'negative words: ', neg_sco, 'Net score: ', net_sco )
    row_string = '{}, {}, {}, {}, {}'.format(twrt[1], twrt[2], pos_sco, neg_sco, net_sco)
    outfile.write(row_string)
    outfile.write('\n')

网友

3楼 · 编辑于 2024-05-13 10:43:47

这里有一个简单的解决方案

punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']
    def strip_punctuation(word):
       for ch in punctuation_chars:
            word=word.replace(ch,"").lower()
    return word
positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())
def get_pos(sentence):
    sent_lst = sentence.split(" ")
    new_lst_sent = []
    for word in sent_lst:
        word = strip_punctuation(word).lower()
        new_lst_sent.append(word)
    pos_count = 0
    for word in positive_words:
        if word in new_lst_sent:
            pos_count = pos_count + 1

negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())
def get_neg(sentence):
    sent_lst = sentence.split(" ")
    new_lst_sent = []
    for word in sent_lst:
        word = strip_punctuation(word).lower()
        new_lst_sent.append(word)
    neg_count = 0
    for word in negative_words:
        if word in new_lst_sent:
            neg_count = neg_count + 1
    return neg_count

outfile = open("resulting_data.csv", "w")
outfile.write('Number of Retweets, Number of Replies, Positive Score,      Negative Score, Net Score')
outfile.write('\n')

myfile = open('project_twitter_data.csv', 'r')
rows = myfile.readlines()[1:]
for line in rows:
    words = line.split()
    numbers = words[-1]
    twrt = numbers.split(',')
    print ('retweets: ', twrt[1], 'replies: ', twrt[2])
    pos_sco = 0
    neg_sco = 0
    for word in words:
        if word in positive_words:
            pos_sco = pos_sco + 1
        if word in negative_words:
            neg_sco = neg_sco + 1
    net_sco = pos_sco - neg_sco
    print ('positive words: ', pos_sco, 'negative words: ', neg_sco, 'Net score: ', net_sco )
    row_string = '{}, {}, {}, {}, {}'.format(twrt[1], twrt[2], pos_sco, neg_sco, net_sco)
    outfile.write(row_string)
    outfile.write('\n')

这里有一个简单的解决方案

相关问题更多 >

编程相关推荐

热门问题

热门文章