将函数应用于数据帧列的最有效方法

def sentiment_scorer(row): pos=neg=0 for item in row['lyrics']: # count positive words if item in positiv: pos += 1 # count negative words elif item in negativ: neg += 1 # ignore words that are neither negative nor positive else: pass # set sentiment to 0 if pos is 0 if pos < 1: pos_sent = 0 else: pos_sent = pos / len(row['lyrics']) # set sentiment to 0 if neg is 0 if neg < 1: neg_sent = 0 else: neg_sent = neg / len(row['lyrics']) # return positive and negative sentiment to make new columns return pos_sent, neg_sent # chunk data frames n = 1000 list_df = [lyrics_cleaned_df[i:i+n] for i in range(0,lyrics_cleaned_df.shape[0],n)] for lr in range(len(list_df)): # credit for method: toto_tico on Stack Overflow https://stackoverflow.com/a/46197147 list_df[lr]['positive_sentiment'], list_df[lr]['negative_sentiment'] = zip(*list_df[lr].apply(sentiment_scorer, axis=1)) list_df[lr]['net_sentiment'] = list_df[lr]['positive_sentiment'] - list_df[lr]['negative_sentiment']

data = [['ego-remix', 2009, 'beyonce-knowles', 'Pop', ['oh', 'baby', 'how']], ['then-tell-me', 2009, 'beyonce-knowles', 'Pop', ['playin', 'everything', 'so']], ['honesty', 2009, 'beyonce-knowles', 'Pop', ['if', 'you', 'search']]] df = pd.DataFrame(data, columns = ['song', 'year', 'artist', 'genre', 'lyrics'])

1条回答

网友

1楼 · 发布于 2024-04-29 16:08:25

如果我正确理解了这个问题并使用了您的示例（我添加了几个单词以创建长度不均匀的列表）。您可以创建一个单独的数据框lyrics，将歌词中的单词转换为单独的列

data = [['ego-remix', 2009, 'beyonce-knowles', 'Pop', ['oh', 'baby', 'how', "d"]], 
        ['then-tell-me', 2009, 'beyonce-knowles', 'Pop', ['playin', 'everything', 'so']], 
        ['honesty', 2009, 'beyonce-knowles', 'Pop', ['if', 'you', 'search']]]

df = pd.DataFrame(data, columns = ['song', 'year', 'artist', 'genre', 'lyrics'])

然后定义lyrics

lyrics = pd.DataFrame(df.lyrics.values.tolist())

#           0            1       2      3
# 0        oh         baby     how      d
# 1    playin   everything      so   None   # Null rows need to be accounted for 
# 2        if          you  search   None   # Null rows need to be accounted for

然后，如果你有两个列表，上面有正面和负面情绪词，如下面所示，你可以使用mean()方法计算每行情绪（歌词）

# positive and negative sentiment words
pos = ['baby', 'you']
neg = ['if', 'so']

# When converting the lyrics list to a new dataframe, it will contain Null values
# when the length of the lists are not the same. Therefore these need to be scaled 
# according to the proportion of null values
null_rows = lyrics.notnull().mean(1)

# Calculate the proportion of positive and negative words, accounting for null values
pos_sent = lyrics.isin(pos).mean(1) / null_rows 
neg_sent = lyrics.isin(neg).mean(1) / null_rows 

# pos_sent
# 0    0.250000
# 1    0.000000
# 2    0.333333

# neg_sent 
# 0    0.000000
# 1    0.333333
# 2    0.333333

如果我完全理解您的问题，那么您应该能够使用df['pos'] = pos_sent和df['neg'] = neg_sent。我想可能会有一些问题，所以让我知道这是在正确的球场

相关问题更多 >

编程相关推荐

热门问题

热门文章