如何根据每行中的条件向数据帧中的列添加多个字符串?

2024-05-19 18:49:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含客户费用和合同费用的数据框。我想比较每个客户各自的费用,并指出每个客户的不匹配之处。以下是df的外观:

^{tb1}$

我想添加一个列,以字符串形式包含每行的所有问题。这是我想要的输出,列'Problem'包含每行的所有问题:

^{tb2}$

到目前为止,我正在努力

nonmatch["Problem"] = np.where(nonmatch['rent'] != nonmatch['rent_doc'],  "rent doesn't match", nonmatch["Problem"] + "")
nonmatch["Problem"] = np.where(nonmatch['1xdisc']!=nonmatch['1xdisc_doc']), " 1xdisc doesn't match.", "")
print(nonmatch[['Resident','Problem']])

但是,单元格中已经存在的任何错误都会被覆盖。如果满足条件,如何向单元格内容添加字符串

我也有一种预感,那就是一定有一种更干净的方法来做到这一点,但我不确定如何做到。我有大约十个条件,我想检查,但这是一个最小的例子


Tags: 数据字符串dfdoc客户matchnpwhere
2条回答

我的看法是:

def get_match(c):
    def match(x):
        return f'{c} doesn\'t match.' if x else ''
    return match

onex = (df['1xdisc'] != df['1xdisc_doc']).map(get_match('1xdisc'))
rent = (df['rent']   != df['rent_doc']  ).map(get_match('rent'))

df.assign(Problem=(['  '.join(filter(bool, tup)) for tup in zip(rent, onex)]))

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc                                     Problem
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632  rent doesn't match.  1xdisc doesn't match.
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642                                            
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655                                            
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990                         rent doesn't match.
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600                                            
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320  rent doesn't match.  1xdisc doesn't match.

广义的

docs = [s for s in [*df] if s.endswith('_doc')]
refs = [s.rsplit('_', 1)[0] for s in docs]

def col_match(c):
    return [f"{c.name} doesn't match" if x else "" for x in c]

problem_df = (df[refs] != df[docs].to_numpy()).apply(col_match)
problem = ['  '.join(filter(bool, tup)) for tup in zip(*map(problem_df.get, refs))]
df.assign(Problem=problem)

您也可以尝试使用concat和groupby+agg。正如piR所说,这可能是过度设计的:

c1 = df['rent'].ne(df['rent_doc'])
c2 = df['1xdisc'].ne(df['1xdisc_doc'])
choices= ["rent doesn't match"," 1xdisc doesn't match."]

s = pd.concat((c1,c2),keys=choices).swaplevel()
out = (df.assign(Problem=
      pd.DataFrame.from_records(s[s].index).groupby(0)[1].agg(" ".join)))

print(out)

   Resident     Tcode     MoveIn  1xdisc  1xdisc_doc  conpark  rent  rent_doc  \
0    Marcus  t0011009  3/16/2021     0.0      -500.0      0.0     0      1632   
1    Joshua  t0011124  3/20/2021     0.0         0.0      0.0  1642      1642   
2    Yvonne  t0010940  3/17/2021  -500.0      -500.0      0.0  1655      1655   
3  Mirabeau  t0011005  3/19/2021  -500.0      -500.0      0.0  1931      1990   
4   Keyonna  t0011084  3/18/2021     0.0         0.0      0.0  1600      1600   
5     Ariel  t0010954  3/22/2021  -300.0         0.0      0.0  1300      1320   

                                     Problem  
0  rent doesn't match  1xdisc doesn't match.  
1                                        NaN  
2                                        NaN  
3                         rent doesn't match  
4                                        NaN  
5  rent doesn't match  1xdisc doesn't match. 

相关问题 更多 >