Pandas中的子数据帧中操纵行值

2024-04-23 11:50:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧df1

    ID1    ID2
0   foo    bar
1   fizz   buzz

另一个df2是这样的:

    ID1    ID2    Count    Code   
0   abc    def      7        B
1   fizz   buzz     5        B
2   fizz1  buzz2    9        C
3   foo    bar      6        B
4   foo    bar      6        Z

我想做的是过滤第二个数据帧,其中ID1ID2匹配第一个数据帧中的一行,作为数据帧sub_df,然后将以下代码sub_df.loc[sub_df["Count"] >= 5, "Code"] = "A"应用到sub_df

子目录:

    ID1    ID2    Count    Code   
1   fizz   buzz     5        B
3   foo    bar      6        B
4   foo    bar      6        Z

最后,我想生成一个数据帧df,如下所示:

    ID1    ID2    Count    Code   
0   abc    def      7        B
1   fizz   buzz     5        A
2   fizz1  buzz2    9        C
3   foo    bar      6        A
4   foo    bar      6        A

我怎么能这么做?非常感谢。你知道吗


Tags: 数据dffoodefcountbarcodedf1
2条回答

您可以用indicator同时^{}两个数据帧,并使用它将Code设置为A或不设置为A

df = df2.merge(df1, how='left', on=['ID1','ID2'], indicator='ind')
df.loc[(df["Count"] >= 5) & (df['ind'] == 'both'), "Code"] = "A" 
df = df.drop('ind', axis=1)

print(df2)

     ID1    ID2  Count Code
0    abc    def      7    B
1   fizz   buzz      5    A
2  fizz1  buzz2      9    C
3    foo    bar      6    A
4    foo    bar      6    A

可以使用^{}作为ID1ID2的组合之间的测试成员关系,这两个列都与^{}连接:

id2 = df2['ID1'].str.cat(df2['ID2'], sep='_')
id1 = df1['ID1'].str.cat(df1['ID2'], sep='_')

df2.loc[(df2["Count"] >= 5) & id2.isin(id1), "Code"] = "A" 
print (df2)
     ID1    ID2  Count Code
0    abc    def      7    B
1   fizz   buzz      5    A
2  fizz1  buzz2      9    C
3    foo    bar      6    A
4    foo    bar      6    A

编辑:

经过测试,对我来说效果不错:

print (df1)
    ID1   ID2
0   foo   bar
1  fizz  buzz

print (df2)
     ID1    ID2        date  price
0    abc    def  2019-08-01      1
1   fizz   buzz  2019-08-02      2
2  fizz1  buzz2  2019-08-02      3
3    foo    bar  2019-08-03      4
4    foo    bar  2019-08-01      5

df2["date"] = pd.to_datetime(df2["date"])
df2.loc[(df2["date"] != '2019-08-01') & (df2['ID1'].isin(df1['ID1'])), "price"] = np.nan, 
print (df2)
     ID1    ID2       date  price
0    abc    def 2019-08-01    1.0
1   fizz   buzz 2019-08-02    NaN <- set NaN beacuse id
2  fizz1  buzz2 2019-08-02    3.0
3    foo    bar 2019-08-03    NaN <- set NaN beacuse id
4    foo    bar 2019-08-01    5.0 <- not set NaN beacuse id but 2019-08-01

相关问题 更多 >