我有如下数据框,如下所示
ID TYPE POLICY_NUMBER DISB_AMT
738 20 FLDINC MSH39990 1
738 21 MSH39990 3848
750 20 INF395737 1
750 21 INF395737 FLDINCL 2350
892 20 SJK389743 3904
892 21 MSH284989 1
我正在尝试按ID分组并提取保单编号,然后在其他类型中搜索,例如:(TYPE=20或21)如果两种类型中的保单编号相同,则检查DISB_AMT>;两排中有一个。如果为true,则不要将其附加到数据帧
例如:ID738在两行中具有相同的保单编号MSH39990。我写了一个脚本,只提取数字,以便比较。ID 738具有相同的保单编号。现在,我们检查DISB_AMT>;1.第一行不是>;1.第二排是3848>;1.不要在结果中包含此ID。对于ID 892,由于两种类型中的保单编号不相同,因此我们仅在DISB_AMT>;21型1个。因为它不是>;1我们将此行添加到结果数据框中
如何将其与其他类型进行比较,检查策略编号是否相同,并构建其余逻辑
预期产出
ID TYPE POLICY_NUMBER DISB_AMT
892 21 MSH284989 1
代码
data = [{"ID":738,"TYPE":20,"POLICY_NUMBER":"FLDINC MSH39990","DISB_AMT":1},
{"ID":738,"TYPE":21,"POLICY_NUMBER":"MSH39990","DISB_AMT":3848},
{"ID":750,"TYPE":20,"POLICY_NUMBER":"INF395737","DISB_AMT":1},
{"ID":750,"TYPE":21,"POLICY_NUMBER":"INF395737 FLDINCL","DISB_AMT":2350},
{"ID":892,"TYPE":20,"POLICY_NUMBER":"SJK389743","DISB_AMT":3904},
{"ID":892,"TYPE":21,"POLICY_NUMBER":"MSH284989","DISB_AMT":1}
]
df=pd.DataFrame(data)
df['CLEANED_POL_NBR']=df.POLICY_NUMBER.str.extract('(\d+)')
IIUC:
输出:
相关问题 更多 >
编程相关推荐