在pandas中选择/排除列集

import numpy as np import pandas as pd # Create a dataframe with columns A,B,C and D df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD')) # Try to create a second dataframe df2 from df with all columns except 'B' and D my_cols = set(df.columns) my_cols.remove('B').remove('D') # This returns an error ("unhashable type: set") df2 = df[my_cols]

3条回答

网友

1楼 · 编辑于 2024-04-17 14:08:48

可以删除不需要的列，也可以选择需要的列

# Using DataFrame.drop
df.drop(df.columns[[1, 2]], axis=1, inplace=True)

# drop by Name
df1 = df1.drop(['B', 'C'], axis=1)

# Select the ones you want
df1 = df[['a','d']]

网友

2楼 · 编辑于 2024-04-17 14:08:48

有一个名为^{}的新索引方法。它返回原始列，并删除作为参数传递的列。

这里，结果用于从df中删除列B和D：

df2 = df[df.columns.difference(['B', 'D'])]

请注意，这是一个基于集合的方法，因此重复的列名将导致问题，并且列顺序可能会更改。

优于drop：当您只需要列列表时，不会创建整个数据帧的副本。例如，为了在列的子集上删除重复项：

# may create a copy of the dataframe
subset = df.drop(['B', 'D'], axis=1).columns

# does not create a copy the dataframe
subset = df.columns.difference(['B', 'D'])

df = df.drop_duplicates(subset=subset)

网友
3楼 · 编辑于 2024-04-17 14:08:48

你不需要把它转换成一个集合：

cols = [col for col in df.columns if col not in ['B', 'D']]
df2 = df[cols]

相关问题更多 >

编程相关推荐

热门问题

热门文章