python会按条件删除重复的列

data={"col1":[2,3,4,5,9,2,6], "col2":[4,2,4,6,0,1,5], "col3":[7,6,0,11,3,6,7], "col4":[14,11,22,8,6,3,9], "col5":[0,5,7,3,8,2,9], "type":["A","A","C","D","B","B","E"], "number":["one","two","two","one","one","two","two"]} df=pd.DataFrame.from_dict(data)

data={"col1":[3,4,5,2,6], "col2":[2,4,6,1,5], "col3":[6,0,11,6,7], "col4":[11,22,8,3,9], "col5":[5,7,3,2,9], "type":["A","C","D","B","E"], "number":["two","two","one","two","two"]} df=pd.DataFrame.from_dict(data)

2条回答

网友

1楼 · 编辑于 2024-05-13 04:18:21

试试这个：

df = df.drop_duplicates(subset=['type'],keep='last')
print(df)

输出：

    col1    col2    col3    col4    col5    type    number
1   3       2       6       11      5       A       two
2   4       4       0       22      7       C       two
3   5       6       11      8       3       D       one
5   2       1       6       3       2       B       two
6   6       5       7       9       9       E       two

网友

2楼 · 编辑于 2024-05-13 04:18:21

您可以链接2个条件-通过比较^{}和^{}的反转掩码，选择所有非one值：

df1 = df[df['number'].ne('one') | ~df['type'].duplicated(keep=False)]
print (df1)
   col1  col2  col3  col4  col5 type number
1     3     2     6    11     5    A    two
2     4     4     0    22     7    C    two
3     5     6    11     8     3    D    one
5     2     1     6     3     2    B    two
6     6     5     7     9     9    E    two

关于有序分类的另一个想法：

cats = pd.unique(['one'] + df['number'].unique().tolist())

df['number'] = pd.Categorical(df['number'], categories=cats, ordered=True)

df2 = df.sort_values('number').drop_duplicates(subset=['type'], keep='last').sort_index()
print (df2)
   col1  col2  col3  col4  col5 type number
1     3     2     6    11     5    A    two
2     4     4     0    22     7    C    two
3     5     6    11     8     3    D    one
5     2     1     6     3     2    B    two
6     6     5     7     9     9    E    two

相关问题更多 >

编程相关推荐

热门问题

热门文章