在pandas中选择/排除列集

518 投票

9 回答

987794 浏览

提问于 2025-04-17 16:16

我想根据选择的列，从一个已有的数据框（dataframe）创建新的视图或数据框。

举个例子，我想从一个叫做 df1 的数据框中创建一个新的数据框 df2，这个新数据框包含 df1 中的所有列，除了其中的两列。我尝试了以下方法，但没有成功：

import numpy as np
import pandas as pd

# Create a dataframe with columns A,B,C and D
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

# Try to create a second dataframe df2 from df with all columns except 'B' and D
my_cols = set(df.columns)
my_cols.remove('B').remove('D')

# This returns an error ("unhashable type: set")
df2 = df[my_cols]

我哪里做错了？更一般来说，pandas 有哪些方法可以帮助我选择和排除数据框中的任意列？

数据处理数据过滤列选择 pandas 数据框数据视图

9 个回答

164

另一种选择，不需要在循环中删除或过滤：

import numpy as np
import pandas as pd

# Create a dataframe with columns A,B,C and D
df = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))

# include the columns you want
df[df.columns[df.columns.isin(['A', 'B'])]]

# or more simply include columns:
df[['A', 'B']]

# exclude columns you don't want
df[df.columns[~df.columns.isin(['C','D'])]]

# or even simpler since 0.24
# with the caveat that it reorders columns alphabetically 
df[df.columns.difference(['C', 'D'])]

回答于 2025-04-17 由 Python大师

分享举报

236

有一个新的索引方法叫做 difference。这个方法会返回原来的列，但会把你传入的列去掉。

在这里，结果用来从 df 中去掉列 B 和 D：

df2 = df[df.columns.difference(['B', 'D'])]

需要注意的是，这个方法是基于集合的，所以如果有重复的列名会出现问题，并且列的顺序可能会改变。

相比于 drop 的优势：当你只需要列的列表时，不会创建整个数据框的副本。例如，如果你想在某些列上去掉重复项：

# may create a copy of the dataframe
subset = df.drop(['B', 'D'], axis=1).columns

# does not create a copy the dataframe
subset = df.columns.difference(['B', 'D'])

df = df.drop_duplicates(subset=subset)

回答于 2025-04-17 由 Python大师

分享举报

719

你可以选择删除不需要的列，或者选择你需要的列。

# Using DataFrame.drop
df.drop(df.columns[[1, 2]], axis=1, inplace=True)

# drop by Name
df1 = df1.drop(['B', 'C'], axis=1)

# Select the ones you want
df1 = df[['a','d']]

回答于 2025-04-17 由 Python大师

分享举报

在pandas中选择/排除列集

9 个回答

撰写回答