基于其他列值的存在删除重复行

2024-06-16 14:23:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手,正在尝试删除任何带有空EOB代码的行,其中该Account Number已经存在多个EOB代码。例如,我们有这个“407”Account Number贡献了三行。我希望删除缺少EOB代码的行,但保留剩下的两行(EOB代码7730和3033)

enter image description here

然而,这里的复杂性(至少对我来说)是其他Account Number从未有过EOB代码。就像下面以“2300”和“6200”结尾的账户一样。在这些特定情况下,这些类型的帐户应保留在数据帧中

enter image description here

以下是此数据集的一小部分:

data = {'Account Number': ['407','407','407','4901','4901','4901','4901','4901','6902','6902','6902','6902','8700','6900','2300','6200','2400','2400','3200','3200','3200','3200','3200','3200','3400','2200','3300','7701','7701','7701','7701','7701','7701','3100','401','401','401','6600','6600','6600','6600'],
     'Payer':['BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS','BCBS'],
     'Remit Type':['IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Denied','IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Paid','IP Denied','IP Denied','IP Denied','IP Denied','IP Denied','IP Denied','IP Paid','IP Denied','IP Denied','IP Denied','IP Paid','IP Paid','IP Paid','IP Paid'],
     'EOB':['','7730','3033','5001','','9932','3035','3038','9015','5000','','9932','','','','','','','','3035','829','9932','2635','5002','','','','851','','852','9932','818','9015','','','2628','3035','5003','','3035','9932'],
     'Date':['Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10','Mar 10'],
     'Status':['INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS SUSPENDED', 'INPATIENT CLAIMS PAID','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS SUSPENDED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS DENIED','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID','INPATIENT CLAIMS PAID']}
df = pd.DataFrame(data,columns=['Account Number','Payer','Remit Type','EOB','Date','Status'])

Tags: 数据代码ipnumberaccountmarinpatientclaims
3条回答

首先,您要检查与帐户关联的所有EOB是否为空。然后您可以将这些与非空EOB组合:

all_empty = df['EOB'].eq('').groupby(df['Account Number']).transform('all')

df[all_empty | df['EOB'].ne('')]

我将尝试在以下条件下确定要删除的索引:

  • 帐号至少有一行具有非空EOB
  • EOB是空字符串

它可以是:

  1. 查找相关的帐号:

    x = df[df.EOB != ''].groupby('Account Number').count()[[]]
    
  2. 删除行:

    df.drop(df.merge(x, left_on='Account Number', right_index=True).query("EOB==''").index,
            inplace=True)
    

从样本开始,它给出:

    Account Number Payer Remit Type   EOB    Date                      Status
1              407  BCBS  IP Denied  7730  Mar 10     INPATIENT CLAIMS DENIED
2              407  BCBS  IP Denied  3033  Mar 10     INPATIENT CLAIMS DENIED
3             4901  BCBS    IP Paid  5001  Mar 10  INPATIENT CLAIMS SUSPENDED
5             4901  BCBS    IP Paid  9932  Mar 10  INPATIENT CLAIMS SUSPENDED
6             4901  BCBS    IP Paid  3035  Mar 10  INPATIENT CLAIMS SUSPENDED
7             4901  BCBS    IP Paid  3038  Mar 10  INPATIENT CLAIMS SUSPENDED
8             6902  BCBS  IP Denied  9015  Mar 10     INPATIENT CLAIMS DENIED
9             6902  BCBS  IP Denied  5000  Mar 10     INPATIENT CLAIMS DENIED
11            6902  BCBS  IP Denied  9932  Mar 10     INPATIENT CLAIMS DENIED
12            8700  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
13            6900  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
14            2300  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
15            6200  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
16            2400  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
17            2400  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
19            3200  BCBS    IP Paid  3035  Mar 10       INPATIENT CLAIMS PAID
20            3200  BCBS    IP Paid   829  Mar 10       INPATIENT CLAIMS PAID
21            3200  BCBS    IP Paid  9932  Mar 10       INPATIENT CLAIMS PAID
22            3200  BCBS    IP Paid  2635  Mar 10       INPATIENT CLAIMS PAID
23            3200  BCBS    IP Paid  5002  Mar 10       INPATIENT CLAIMS PAID
24            3400  BCBS    IP Paid        Mar 10  INPATIENT CLAIMS SUSPENDED
25            2200  BCBS    IP Paid        Mar 10       INPATIENT CLAIMS PAID
26            3300  BCBS    IP Paid        Mar 10  INPATIENT CLAIMS SUSPENDED
27            7701  BCBS  IP Denied   851  Mar 10     INPATIENT CLAIMS DENIED
29            7701  BCBS  IP Denied   852  Mar 10     INPATIENT CLAIMS DENIED
30            7701  BCBS  IP Denied  9932  Mar 10     INPATIENT CLAIMS DENIED
31            7701  BCBS  IP Denied   818  Mar 10     INPATIENT CLAIMS DENIED
32            7701  BCBS  IP Denied  9015  Mar 10     INPATIENT CLAIMS DENIED
33            3100  BCBS    IP Paid        Mar 10  INPATIENT CLAIMS SUSPENDED
35            0401  BCBS  IP Denied  2628  Mar 10     INPATIENT CLAIMS DENIED
36            0401  BCBS  IP Denied  3035  Mar 10     INPATIENT CLAIMS DENIED
37            6600  BCBS    IP Paid  5003  Mar 10       INPATIENT CLAIMS PAID
39            6600  BCBS    IP Paid  3035  Mar 10       INPATIENT CLAIMS PAID
40            6600  BCBS    IP Paid  9932  Mar 10       INPATIENT CLAIMS PAID

首先,就个人而言,我建议不要在pandas中使用''空字符串。改用np.nan

import numpy as np

df['EOB'] = df['EOB'].replace('', np.nan)

然后定义一个helper函数,当有超过1行时,仅将其应用于dropna,并基于Account Number将其应用于groupby

def dropna_if_not_the_only_row(df):
    if df.shape[0] == 1:
        return df
    else:
        return df.dropna()

df.groupby('Account Number').apply(lambda x : dropna_if_not_the_only_row(x)).reset_index(drop=True)

相关问题 更多 >