获取匹配记录的索引

def cal_nega_mean(my_string): mean = 0.00 mean_tot = 0 mean_sum = 0.00 for word in my_string.split(): if word in df.values: #at this point if it founds then get index, so that i dont have to use for loop in next line for index, row in df.iterrows(): #want to change if word == row.word: # this part if row['value'] < -0.40: mean_tot += 1 mean += row['value'] break if mean_tot == 0: return 0 mean = mean_sum / mean_tot return round(mean,2)

my_string = "i have a problem with my python code" cal_nega_mean(my_string) # and i am using this to get return for all records df_tweets['intensity'] = df_tweets['tweets'].apply(lambda row: cal_nega_mean(row))

3条回答

网友

1楼 · 编辑于 2024-04-25 04:03:05

Pandas有一些有用的文本处理功能，应该可以帮助您。我建议你使用pd.Series.str.contains()

def cal_nega_mean(my_string):
    words = '|'.join(my_string.split())
    matches = df['word'].str.contains(words, regex=True)
    mask = (df['value'] >= -0.40) & matches # don't need value >= -0.40 if you just drop those rows
    mean_tot = mask.sum()
    mean_sum = df[mask]['value'].sum()
    mean = mean_sum / mean_tot
    return round(mean, 2)

不相关，但我也建议您删除带有“value”的行<-0.40，因为你忽略了它们

我还没有机会测试这个，但它应该可以完成任务，而且它已经矢量化了

网友

2楼 · 编辑于 2024-04-25 04:03:05

您可以尝试使用i = df[df.word == word].index[0]获取满足条件df.word == word的第一行的索引。一旦有了索引，就可以用df.loc切掉行

def cal_nega_mean(my_string):
    mean = 0.00
    mean_tot = 0
    mean_sum = 0.00
    for word in my_string.split():
        try:
            i = df[df.word == word].index[0]
        except:
            continue
        row = df.loc[i]
        if row['value'] < -0.40:
            mean_tot += 1
            mean += row['value']
            break
    if mean_tot == 0:
        return 0
    mean = mean_sum / mean_tot
    return round(mean,2)

网友

3楼 · 编辑于 2024-04-25 04:03:05

下面是一种使用字典的方法，您可以将word: value转换为键、值存储并将其用作查找：

word_look_up = dict(zip(df['word'], df['value']))


def cal_nega_mean(my_string): 
    mean = 0.0
    mean_tot = 0
    mean_sum = 0.00
    words = [word for word in my_string.split() if word in word_look_up]

    if not any(words): # if no word found
        return 0
    else:
        for word in words:
            value = word_look_up[word]
            if value < -0.40:
                mean_tot += 1
                mean += value
                break

    mean = mean / mean_tot
    return round(mean, 2)


df['intensity'] = df['word'].apply(cal_nega_mean)

相关问题更多 >

编程相关推荐

热门问题

热门文章