处理布尔系列时的索引问题

1 投票

1 回答

29 浏览

提问于 2025-04-14 16:12

我觉得我在用DataFrame过滤数据时遇到了索引问题。

我的逻辑是，我会在DataFrame上应用不同的筛选条件，而不是直接限制DataFrame，我是根据我自己的逻辑来构建这些筛选条件。

# df as an input (which has already been filtered and I think it messed with it's indexing

from pandas import Series


mask = Series([True] * df.shape[0]) 

if some_filter is not None:
    col_mask = df['aCol'] == some_filter
    mask = mask & col_mask

在这里，如果我查看：

mask.shape
col_mask.shape

在最后一行之前，它们是完全相同的。

但是在最后一行之后，mask.shape显示的行数几乎是之前的两倍，我觉得这是因为序列是有索引的，而这些索引不匹配，布尔运算实际上是在填补空白。

我能想到一些解决方法，但我希望能找到更合适的方式来处理这个问题。

数据处理数据分析数据过滤筛选条件 pandas dataframe 布尔索引索引问题

1 个回答

你已经说过了，你的数据框（dataframe）没有连续的索引，比如说 df.index == [0,1,3,5]。你可以直接传一个 np.array：

# list also works
# mask = [True] * len(df)
mask = np.array([True] * len(df))

或者把 df 的索引传给 Series 的构造函数，这样就能对齐了：

mask = pd.Series([True] * len(df), index=df.index)

这样你接下来的代码就可以正常运行了：

if some_filter is not None:
    col_mask = df['aCol'] == some_filter
    mask = mask & col_mask

回答于 2025-04-14 由 Python大师

分享举报

处理布尔系列时的索引问题

1 个回答

撰写回答