在Python中将重复行从列的子集移动到另一个数据帧

2条回答

网友

1楼 · 编辑于 2024-04-19 21:32:03

方法一：
将nunique与dropna=False一起使用

m = df.nunique(dropna=False).eq(1)

df_dup = df.iloc[[0], m.values]

Out[121]:
      0       1    2
0  cats  tigers  3.5

df_notdup = df.loc[:, ~m]

Out[123]:
      3     4     5     6
0     1  cars   2.0   5.0
1     6   7.2  22.6   5.0
2  test   2.6  99.0  52.3

方法2:
使用listcomp并在每个列上检查带有选项keep=False的duplicated，然后检查all

m = np.array([df[x].duplicated(keep=False).all() for x in df])

df_dup = df.loc[:, m]

Out[65]:
      0       1    2
0  cats  tigers  3.5
1  cats  tigers  3.5
2  cats  tigers  3.5

正如@Moys所提到的，如果您只想要df_dup中的一行，您可以使用drop_duplicates或简单地使用.head(1)或iloc

df_dup = df.loc[:, m].head(1)

或者

df_dup = df.iloc[[0], m]

Out[91]:
      0       1    2
0  cats  tigers  3.5

对于非重复行：

df_notdup = df.loc[:, ~m]

Out[75]:
      3     4     5     6
0     1  cars   2.0   5.0
1     6   7.2  22.6   5.0
2  test   2.6  99.0  52.3

网友

2楼 · 编辑于 2024-04-19 21:32:03

你可以用

df1 = pd.DataFrame(df.val.str.extract('([a-zA-Z ]+)', expand=False).str.strip().drop_duplicates()) #'val' is the column in which you have these values
print(df1)

输出

     val
0   ABCD

以及

df2 = pd.DataFrame(df.val.str.extract('([0-9]+)', expand=False).str.strip().drop_duplicates()) #'val' is the column in which you have these values
print(df2)

输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python中将重复行从列的子集移动到另一个数据帧

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >