Python根据条件将一列添加到包含另一行值的dataframe中

+-----+-------+----------+-------+ | No | Group | refGroup | Value | +-----+-------+----------+-------+ | 123 | A1 | A1 | 5.0 | | 123 | B1 | A1 | 7.3 | | 123 | B2 | A1 | 8.9 | | 123 | B3 | B1 | 7.9 | | 465 | A1 | A1 | 1.4 | | 465 | B1 | A1 | 4.5 | | 465 | B2 | B1 | 7.3 | +-----+-------+----------+-------+

+-----+-------+----------+-------+----------+ | No | Group | refGroup | Value | refValue | +-----+-------+----------+-------+----------+ | 123 | A1 | A1 | 5.0 | 5.0 | | 123 | B1 | A1 | 7.3 | 2.3 | | 123 | B2 | A1 | 8.9 | 3.9 | | 123 | B3 | B1 | 7.9 | 0.6 | | 465 | A1 | A1 | 1.4 | 1.4 | | 465 | B1 | A1 | 4.5 | 3.1 | | 465 | B2 | B1 | 7.3 | 2.8 | +-----+-------+----------+-------+----------+

2条回答

网友

1楼 · 编辑于 2024-05-19 01:37:38

一种方法是使用类似SQL的数据库技术；将“self join”与merge一起使用。使用left_on和right_on将“Group”与“refGroup”对齐，然后从每个数据帧记录中减去值，将数据帧合并/加入到自身：

df_out = df.merge(df, 
                  left_on=['No','refGroup'], 
                  right_on=['No','Group'], 
                  suffixes=('','_ref'))

df['refValue'] = np.where(df_out['Group'] == df_out['refGroup'],
                          df_out['value'],
                          df_out['value'] - df_out['value_ref'])

df

输出：

    No Group refGroup  value  refValue
0  123    A1       A1    5.0       5.0
1  123    B1       A1    7.3       2.3
2  123    B2       A1    8.9       3.9
3  123    B3       B1    7.9       0.6
4  465    A1       A1    1.4       1.4
5  465    B1       A1    4.5       3.1
6  465    B2       B1    7.3       2.8

网友

2楼 · 编辑于 2024-05-19 01:37:38

使用理解列表，您可以：

df['refValue'] = [ row['Value'] - float(df.loc[(df['No']==row['No']) & (df['Group']==row['refGroup']),'Value'].values) if row['refGroup']!=row['Group'] else row['Value'] for index, row in df.iterrows() ]

相关问题更多 >

编程相关推荐

热门问题

热门文章