Coursera Python最终项目情感分类器

2024-05-13 10:43:47 发布

您现在位置:Python中文网/ 问答频道 /正文

最后,复制以前的函数并编写代码打开文件project_twitter_data.csv,该文件包含伪生成的twitter数据(tweet文本、该tweet的转发次数以及该tweet的回复次数)。你的任务是建立一个情绪分类器,它将检测每条推文的积极或消极程度。从上面的代码窗口复制代码,并将其放在此代码窗口的顶部。现在,您将编写代码来创建一个名为resulting_data.CSV的CSV文件,该文件包含每条推文的转发数、回复数、正分数(表示推文中有多少快乐的词)、负分数(表示推文中有多少愤怒的词)和净分数(文本总体上是正还是负)。该文件应按该顺序包含这些标题。请记住,此项目还有另一个组件。您将把CSV文件上传到Excel或Google Sheets,并生成一张净分数与转发次数的图表。如果您是从Coursera访问本教科书,请查看Coursera的作业部分

我需要帮助回答这个问题。从大约一个星期以来一直被困在这个问题上。请帮助这是最后的项目

punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']
def strip_punctuation(a):
    for x in punctuation_chars:
        if x in a:
            a = a.replace(x,"")
    return(a)
positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())
def get_pos(c):
    pos = 0
    b = c.lower()
    b = strip_punctuation(b)
    lst = b.split(" ")
    for i in positive_words:
        for j in lst:
            if i == j:
                pos+=1
    return pos
negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())
def get_neg(c):
    neg = 0
    b = c.lower()
    b = strip_punctuation(b)
    lst = b.split(" ")
    for i in negative_words:
        for j in lst:
            if i == j:
                neg+=1
    return neg
file = open("project_twitter_data.csv", "r")
e = file.read()
nega = posi = 0
for f in e:
    nega += get_neg(f)
    negat = nega*-1
    posi += get_pos(f)
negat = nega*-1
ne = str(nega)
po = str(posi)
net = posi + negat
netd = str(net)
filer = open('resulting_data.csv','w')
result = filer.write('Number of Retweets, Number of Replies, Positive Score, Negtive Score, Net Score\n')
result = filer.write('0, 0, ' + ne +', ' + po +", " + netd + '\n')

这就是我所能想到的。我不能在这里使用导入CSV。它不允许我这样做

一些好话-

a+ abound abounds abundance abundant accessable accessible acclaim acclaimed acclamation
这些单词存储在文件positive_words.txt中 一些否定词-

2-faced 2-faces abnormal abolish abominable abominably abominate abomination abort

这些单词存储在negative_words.txt中 推特数据-

tweet_text,retweet_count,reply_count @twitteruser: On now - @Fusion scores first points #FirstFinals @overwatchleague @umich @umsi Michigan Athletics made out of emojis. #GoBlue,3,0 BUNCH of things about crisis respons… available July 8th… scholarship focuses on improving me… in North America! A s… and frigid temperatures,1,0 FREE ice cream with these local area deals: chance to

此外,在此之后,我必须将其保存在一个CSV格式的文件中


Tags: 文件csv代码inposforifstrip
3条回答

谢谢你更新你的问题。首先,我要定义程序的入口点,例如main。然后,只需进行初步的CSV(非常简单)解析即可。这只是打印有关CSV中每个条目的信息,以验证我们是否正确解析它:

def main():

    with open("project_twitter_data.csv", "r") as file:
        # Skip the first line
        next(file)
        for tweet, retweet_count, reply_count in map(lambda line: line.strip().split(","), file):
            print(f"tweet: {tweet[:20]}...\nretweet_count: {retweet_count}\nreply_count: {reply_count}\n")
        

if __name__ == "__main__":
    main()

输出:

tweet: @twitteruser: On now...
retweet_count: 3
reply_count: 0

tweet: BUNCH of things abou...
retweet_count: 1
reply_count: 0

>>> 

我的CSV文件中只有两个条目,但它应该适用于任意数量的条目(只要推文中没有逗号)

然后,你需要加载你的积极和消极的话。我假设文件不是太大,所以你可以把所有单词都读入列表。有许多不同的方法可以计算每条推文的正面和负面词汇。例如,您可以将当前推文拆分为空白,以获得“单词”列表。我之所以说“单词”,是因为从技术上讲,这些字符串可能包含标点符号,所以您必须以某种方式将其考虑在内。另一种方法是使用带有单词边界的正则表达式模式从当前tweet生成单词列表。我在下面所做的只是在当前tweet中寻找一个子串,这有点幼稚。除非有一个适当的单元测试,故意寻找以确保没有使用这种方法,否则这应该足够好了

def main():

    with open("positive_words.txt", "r") as file:
        positive_words = file.read().splitlines()

    with open("negative_words.txt", "r") as file:
        negative_words = file.read().splitlines()

    with open("project_twitter_data.csv", "r") as file:
        # Skip the first line
        next(file)
        for tweet, retweet_count, reply_count in map(lambda line: line.strip().split(","), file):
            positive_count = sum(tweet.count(word) for word in positive_words)
            negative_count = sum(tweet.count(word) for word in negative_words)
            net_count = positive_count - negative_count
            # Write retweet_count, reply_count, positive_count, negative_count and net_count to resulting_data.csv
            
        

if __name__ == "__main__":
    main()

这是我的代码,它为MAB的coursera项目工作

punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']

def strip_punctuation  (x):
    for i in punctuation_chars:
        x = x.replace(i, '')
    return x
def get_pos (x):
    x = strip_punctuation(x)
    y = x.lower().split()
    count = 0
    for i in y:
        if i in positive_words:
            count = count + 1
    return count
def get_neg (x):
    x = strip_punctuation(x)
    y = x.lower().split()
    count = 0
    for i in y:
        if i in negative_words:
            count = count + 1
    return count

positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())
negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())

outfile = open("resulting_data.csv", "w")
outfile.write('Number of Retweets, Number of Replies, Positive Score, Negative Score, Net Score')
outfile.write('\n')

myfile = open('project_twitter_data.csv', 'r')
rows = myfile.readlines()[1:]
for line in rows:
    words = line.split()
    numbers = words[-1]
    twrt = numbers.split(',')
    print ('retweets: ', twrt[1], 'replies: ', twrt[2])
    pos_sco = 0
    neg_sco = 0
    for word in words:
        if word in positive_words:
            pos_sco = pos_sco + 1
        if word in negative_words:
            neg_sco = neg_sco + 1
    net_sco = pos_sco - neg_sco
    print ('positive words: ', pos_sco, 'negative words: ', neg_sco, 'Net score: ', net_sco )
    row_string = '{}, {}, {}, {}, {}'.format(twrt[1], twrt[2], pos_sco, neg_sco, net_sco)
    outfile.write(row_string)
    outfile.write('\n')

这里有一个简单的解决方案

punctuation_chars = ["'", '"', ",", ".", "!", ":", ";", '#', '@']
    def strip_punctuation(word):
       for ch in punctuation_chars:
            word=word.replace(ch,"").lower()
    return word
positive_words = []
with open("positive_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            positive_words.append(lin.strip())
def get_pos(sentence):
    sent_lst = sentence.split(" ")
    new_lst_sent = []
    for word in sent_lst:
        word = strip_punctuation(word).lower()
        new_lst_sent.append(word)
    pos_count = 0
    for word in positive_words:
        if word in new_lst_sent:
            pos_count = pos_count + 1

negative_words = []
with open("negative_words.txt") as pos_f:
    for lin in pos_f:
        if lin[0] != ';' and lin[0] != '\n':
            negative_words.append(lin.strip())
def get_neg(sentence):
    sent_lst = sentence.split(" ")
    new_lst_sent = []
    for word in sent_lst:
        word = strip_punctuation(word).lower()
        new_lst_sent.append(word)
    neg_count = 0
    for word in negative_words:
        if word in new_lst_sent:
            neg_count = neg_count + 1
    return neg_count

outfile = open("resulting_data.csv", "w")
outfile.write('Number of Retweets, Number of Replies, Positive Score,      Negative Score, Net Score')
outfile.write('\n')

myfile = open('project_twitter_data.csv', 'r')
rows = myfile.readlines()[1:]
for line in rows:
    words = line.split()
    numbers = words[-1]
    twrt = numbers.split(',')
    print ('retweets: ', twrt[1], 'replies: ', twrt[2])
    pos_sco = 0
    neg_sco = 0
    for word in words:
        if word in positive_words:
            pos_sco = pos_sco + 1
        if word in negative_words:
            neg_sco = neg_sco + 1
    net_sco = pos_sco - neg_sco
    print ('positive words: ', pos_sco, 'negative words: ', neg_sco, 'Net score: ', net_sco )
    row_string = '{}, {}, {}, {}, {}'.format(twrt[1], twrt[2], pos_sco, neg_sco, net_sco)
    outfile.write(row_string)
    outfile.write('\n')

相关问题 更多 >