基于dataframe中的其他行值添加新列

2024-06-16 18:43:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个名为“test”的数据框架和一个单词列表list_w = ['monthly', 'moon']。我想添加一个新的列“revised cosine”,这样:对于在list_w中存在的每个单词,如果条件为weak且条件为cosine == 'Na',则它们对应条件unrel_weak的修订余弦也将为Na,同样,对于列表w中存在的每个单词,条件为strong且条件为cosine == 'Na',它们对应条件的revised cosineunrel_strong'也将是“Na

     isi       prime   target     condition  meanRT cosine 
0     50      weekly  monthly        strong   676.2    0.9
1   1050      weekly  monthly        strong   643.5    0.9
2     50       daily  monthly          weak   737.2     Na
3   1050       daily  monthly          weak   670.6     Na
4     50     bathtub  monthly  unrel_strong   692.2    0.1
5   1050     bathtub  monthly  unrel_strong   719.1    0.1
6     50      sponge  monthly    unrel_weak   805.8    0.3
7   1050      sponge  monthly    unrel_weak   685.7    0.3
8     50    crescent     moon        strong   625.0     Na
9   1050    crescent     moon        strong   537.2     Na
10    50      sunset     moon          weak   698.4    0.2
11  1050      sunset     moon          weak   704.3    0.2
12    50    premises     moon  unrel_strong   779.2    0.7
13  1050    premises     moon  unrel_strong   647.6    0.7
14    50     descent     moon    unrel_weak   686.0    0.5
15  1050     descent     moon    unrel_weak   725.4    0.5

我的代码如下:

for w in list_w:
    if test.loc[(test['target']==w) & (test['condition']=='strong'), 'cosine']=='Na':
        test.loc[(test['target']==w) & (test['condition']=='unrel_strong'), 'cosine'] ='Na'

我的代码返回错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

我的预期输出应该与下面的数据框类似(添加了“revised cosine”列)

     isi       prime   target     condition  meanRT cosine revised cosine
0     50      weekly  monthly        strong   676.2    0.9  0.9
1   1050      weekly  monthly        strong   643.5    0.9  0.9
2     50       daily  monthly          weak   737.2     Na  Na
3   1050       daily  monthly          weak   670.6     Na  Na
4     50     bathtub  monthly  unrel_strong   692.2    0.1  0.1
5   1050     bathtub  monthly  unrel_strong   719.1    0.1  0.1
6     50      sponge  monthly    unrel_weak   805.8    0.3  Na
7   1050      sponge  monthly    unrel_weak   685.7    0.3  Na
8     50    crescent     moon        strong   625.0     Na  Na
9   1050    crescent     moon        strong   537.2     Na  Na
10    50      sunset     moon          weak   698.4    0.2  0.2
11  1050      sunset     moon          weak   704.3    0.2  0.2
12    50    premises     moon  unrel_strong   779.2    0.7  Na
13  1050    premises     moon  unrel_strong   647.6    0.7  Na
14    50     descent     moon    unrel_weak   686.0    0.5  0.5
15  1050     descent     moon    unrel_weak   725.4    0.5  0.5

有什么办法可以帮我吗?我查了一下logical_,但它们似乎只适用于两种情况。 覆盖cosine列也很好,只要输出像revised cosine。提前感谢


Tags: testtargetcondition条件strongdailynaweekly
2条回答

溶液

m = test['cosine'].eq('Na') & \
    test['target'].isin(list_w) & \
    test['condition'].isin(['weak', 'strong'])

i1 = test.set_index(['isi', 'target', 'condition']).index
i2 = test[m].set_index(['isi', 'target', test.loc[m, 'condition'].radd('unrel_')]).index

test['revised_cosine'] = test['cosine'].mask(i1.isin(i2), 'Na')

解释

让我们创建一个布尔掩码m,当cosine列包含Na时,它保存True,同时target列包含来自list_w的一个单词,而condition列是weakstrong

>>> m

0     False
1     False
2      True
3      True
4     False
5     False
6     False
7     False
8      True
9      True
10    False
11    False
12    False
13    False
14    False
15    False
dtype: bool

基于isitargetcondition列创建一个MultiIndex,我们将其称为i1。使用掩码m过滤test数据帧中的行,向condition列中过滤的行添加前缀unrel_,并以类似方式创建另一个多索引i2

>>> i1
MultiIndex([(  50, 'monthly',       'strong'),
            (1050, 'monthly',       'strong'),
            (  50, 'monthly',         'weak'),
            (1050, 'monthly',         'weak'),
            (  50, 'monthly', 'unrel_strong'),
            (1050, 'monthly', 'unrel_strong'),
            (  50, 'monthly',   'unrel_weak'),
            (1050, 'monthly',   'unrel_weak'),
            (  50,    'moon',       'strong'),
            (1050,    'moon',       'strong'),
            (  50,    'moon',         'weak'),
            (1050,    'moon',         'weak'),
            (  50,    'moon', 'unrel_strong'),
            (1050,    'moon', 'unrel_strong'),
            (  50,    'moon',   'unrel_weak'),
            (1050,    'moon',   'unrel_weak')],
           names=['isi', 'target', 'condition'])

>>> i2
MultiIndex([(  50, 'monthly',   'unrel_weak'),
            (1050, 'monthly',   'unrel_weak'),
            (  50,    'moon', 'unrel_strong'),
            (1050,    'moon', 'unrel_strong')],
           names=['isi', 'target', 'condition'])

Mask使用布尔掩码在cosine列中创建值,该掩码可以通过测试i2i1的成员身份来创建

     isi     prime   target     condition  meanRT cosine revised_cosine
0     50    weekly  monthly        strong   676.2    0.9            0.9
1   1050    weekly  monthly        strong   643.5    0.9            0.9
2     50     daily  monthly          weak   737.2     Na             Na
3   1050     daily  monthly          weak   670.6     Na             Na
4     50   bathtub  monthly  unrel_strong   692.2    0.1            0.1
5   1050   bathtub  monthly  unrel_strong   719.1    0.1            0.1
6     50    sponge  monthly    unrel_weak   805.8    0.3             Na
7   1050    sponge  monthly    unrel_weak   685.7    0.3             Na
8     50  crescent     moon        strong   625.0     Na             Na
9   1050  crescent     moon        strong   537.2     Na             Na
10    50    sunset     moon          weak   698.4    0.2            0.2
11  1050    sunset     moon          weak   704.3    0.2            0.2
12    50  premises     moon  unrel_strong   779.2    0.7             Na
13  1050  premises     moon  unrel_strong   647.6    0.7             Na
14    50   descent     moon    unrel_weak   686.0    0.5            0.5
15  1050   descent     moon    unrel_weak   725.4    0.5            0.5

此错误消息来自这样一个事实,即您无法对Pandas执行类似这样的if语句

试着这样做:

for w in list_w:
    for c in ["weak", "strong"]:
        mask = (
            (test["target"] == w) & (test["condition"] == c) & (test["cosine"] == "Na")
        )
        test.loc[mask, "revised cosine"] = "Na"

相关问题 更多 >