Python Pandas - 为什么`in`运算符适用于索引而不适用于数据？

3 投票

1 回答

1082 浏览

提问于 2025-04-18 13:57

我通过实践发现，Pandas中的in操作符在处理Series时，是针对索引而不是实际数据进行操作的。

In [1]: import pandas as pd

In [2]: x = pd.Series([1, 2, 3])

In [3]: x.index = [10, 20, 30]

In [4]: x
Out[4]:
10    1
20    2
30    3
dtype: int64

In [5]: 1 in x
Out[5]: False


In [6]: 10 in x
Out[6]: True

我本以为x这个系列里包含数字1，而不是索引10，结果发现我错了。这种行为背后的原因是什么呢？以下这些方法是否是最好的替代方案？

In [7]: 1 in set(x)
Out[7]: True

In [8]: 1 in list(x)
Out[8]: True

In [9]: 1 in x.values
Out[9]: True

更新

我对我的建议做了一些时间测试。看起来x.values是最好的方法：

In [21]: x = pd.Series(np.random.randint(0, 100000, 1000))

In [22]: x.index = np.arange(900000, 900000 + 1000)

In [23]: x.tail()
Out[23]:
900995    88999
900996    13151
900997    25928
900998    36149
900999    97983
dtype: int64

In [24]: %timeit 36149 in set(x)
10000 loops, best of 3: 190 µs per loop

In [25]: %timeit 36149 in list(x)
1000 loops, best of 3: 638 µs per loop

In [26]: %timeit 36149 in (x.values)
100000 loops, best of 3: 6.86 µs per loop

性能优化数据处理数据索引 pandas 替代方案运算符重载

1 个回答

可以把 pandas.Series 想象成一个字典，里面的 index 值就像字典里的 keys。我们可以这样对比：

>>> d = {'a': 1}
>>> 1 in d
False
>>> 'a' in d
True

和：

>>> s = pandas.Series([1], index=['a'])
>>> 1 in s
False
>>> 'a' in s
True

不过要注意，当你遍历这个系列的时候，其实是在遍历里面的数据，而不是 index。所以如果你用 list(s)，得到的会是 [1]，而不是 ['a']。

实际上，根据官方文档，index 的值 “必须是唯一的并且可以被哈希”，所以我猜里面可能有个哈希表在支撑着。

回答于 2025-04-18 由 Python大师

分享举报

Python Pandas - 为什么`in`运算符适用于索引而不适用于数据？

1 个回答

撰写回答