在多索引数据帧中,基于帐户级别0的一个特定列删除重复值

2024-06-16 10:16:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个多索引数据帧,如下所示:

df = {'Modality': {('0020413', '1', '6/21/2017', 'DTI'): 1,
  ('0020413', '1', '6/21/2017', 'FLAIR'): 1,
  ('0020413', '1', '6/21/2017', 'T1'): 1,
  ('0020413', '3', '8/27/2019', 'DTI'): 1,
  ('0020413', '3', '8/27/2019', 'FLAIR'): 1,
  ('0020413', '3', '8/27/2019', 'T1'): 1,
  ('0021261', '1', '3/15/2017', 'DTI'): 1,
  ('0021261', '1', '3/15/2017', 'FLAIR'): 1,
  ('0021261', '1', '3/15/2017', 'T1'): 1,
  ('0021261', '2', '4/24/2018', 'DTI'): 1,
  ('0021261', '2', '4/24/2018', 'FLAIR'): 1,
  ('0021261', '2', '4/24/2018', 'T1'): 1,
  ('0021261', '3', '5/01/2019', 'DTI'): 1,
  ('0021261', '3', '5/01/2019', 'FLAIR'): 1,
  ('0021261', '3', '5/01/2019', 'T1'): 1},
 'Phase': {('0020413', '1', '6/21/2017', 'DTI'): 1,
  ('0020413', '1', '6/21/2017', 'FLAIR'): 1,
  ('0020413', '1', '6/21/2017', 'T1'): 1,
  ('0020413', '3', '8/27/2019', 'DTI'): 1,
  ('0020413', '3', '8/27/2019', 'FLAIR'): 1,
  ('0020413', '3', '8/27/2019', 'T1'): 1,
  ('0021261', '1', '3/15/2017', 'DTI'): 1,
  ('0021261', '1', '3/15/2017', 'FLAIR'): 1,
  ('0021261', '1', '3/15/2017', 'T1'): 1,
  ('0021261', '2', '4/24/2018', 'DTI'): 1,
  ('0021261', '2', '4/24/2018', 'FLAIR'): 1,
  ('0021261', '2', '4/24/2018', 'T1'): 1,
  ('0021261', '3', '5/01/2019', 'DTI'): 1,
  ('0021261', '3', '5/01/2019', 'FLAIR'): 1,
  ('0021261', '3', '5/01/2019', 'T1'): 1}}

我一直试图在level_3列中删除一些重复的值,但它没有出现在我的数据示例中,因为它非常庞大,并且我无法获得重复值的特定行,但有时对于每个“level_0”,在“level_3”中有三个以上的值。这些值是重复的,例如,您可以为单个“级别0”找到“DTI、FLAIR、FLAIR、T1、T1”

我一直在努力:

df = df.drop_duplicates(subset = 'Description', keep = "first")

但我有一个错误:

KeyError: Index(['Description'], dtype='object')

我相信这是因为数据帧是多索引的,但是我在多索引数据帧中找不到关于删除重复项的信息

你能帮我吗


Tags: 数据示例dfdescription级别leveldropduplicates
1条回答
网友
1楼 · 发布于 2024-06-16 10:16:38

IIUC

尝试:

df=df.reset_index()

out=df[df['level_3'].isin(['DTI', 'FLAIR', 'T1'])]

out=out.drop_duplicates(['level_0','level_1','level_2','level_3']).set_index(['level_0','level_1','level_2','level_3'])

out.index.names=[None,None,None,None]

现在,如果您打印out,您将获得预期的输出

相关问题 更多 >