为什么pandas逻辑运算符不能像它应该的那样在索引上对齐呢?

2024-03-28 18:11:54 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑一下这个简单的设置:

x = pd.Series([1, 2, 3], index=list('abc'))
y = pd.Series([2, 3, 3], index=list('bca'))

x

a    1
b    2
c    3
dtype: int64

y

b    2
c    3
a    3
dtype: int64

如您所见,索引是相同的,只是顺序不同。在

现在,考虑使用相等(==)运算符进行一个简单的逻辑比较:

^{pr2}$

这将抛出一个ValueError,很可能是因为索引不匹配。另一方面,{cd3>调用另一个操作符起作用:

x.eq(y)

a    False
b     True
c     True
dtype: bool

在给定{}首先重新排序的情况下,运算符方法有效。。。在

x == y.reindex_like(x)

a    False
b     True
c     True
dtype: bool

我的理解是函数和运算符的比较应该做同样的事情,所有其他的事情都是一样的。运算符比较没有做的eq在做什么?在


Tags: falsetrueindex运算符事情listserieseq
3条回答

我喜欢python的一点是,你几乎可以在源代码中找到任何东西。并且从pd.Series.eq源代码调用:

def flex_wrapper(self, other, level=None, fill_value=None, axis=0):
    # other stuff
    # ...

    if isinstance(other, ABCSeries):
        return self._binop(other, op, level=level, fill_value=fill_value)

然后转到pd.Series._binop

^{pr2}$

这意味着eq运算符在比较之前对两个序列进行对齐(显然,正常运算符==没有)。在

回到2012年,当我们没有eqne和{},pandas有一个问题:无序Series将返回带有逻辑(>,<,==,!=)的意外输出,因此他们做了一个修复(添加了新函数,gtgene….)

GitHub票证reference

查看具有不匹配索引的序列比较的整个回溯,尤其关注异常消息:

In [1]: import pandas as pd
In [2]: x = pd.Series([1, 2, 3], index=list('abc'))
In [3]: y = pd.Series([2, 3, 3], index=list('bca'))
In [4]: x == y
                                     -
ValueError                                Traceback (most recent call last)
<ipython-input-4-73b2790c1e5e> in <module>()
  > 1 x == y
/usr/lib/python3.7/site-packages/pandas/core/ops.py in wrapper(self, other, axis)
   1188 
   1189         elif isinstance(other, ABCSeries) and not self._indexed_same(othe
r):
-> 1190             raise ValueError("Can only compare identically-labeled "
   1191                              "Series objects")
   1192 
ValueError: Can only compare identically-labeled Series objects

我们认为,这是一项深思熟虑的执行决定。此外,这并不是Series对象所独有的,数据帧也会引发类似的错误。在

挖掘相关行的Git错误最终会找到一些相关的提交和问题跟踪线程。例如,Series.__eq__用来完全忽略RHS的索引,而在comment关于该行为的bug报告中,Pandas作者韦斯·麦金尼(Wes McKinney)说:

This is actually a feature / deliberate choice and not a bug it's related to #652. Back in January I changed the comparison methods to do auto-alignment, but found that it led to a large amount of bugs / breakage for users and, in particular, many NumPy functions (which regularly do things like arr[1:] == arr[:-1]; example: np.unique) stopped working.

This gets back to the issue that Series isn't quite ndarray-like enough and should probably not be a subclass of ndarray.

So, I haven't got a good answer for you except for that; auto-alignment would be ideal but I don't think I can do it unless I make Series not a subclass of ndarray. I think this is probably a good idea but not likely to happen until 0.9 or 0.10 (several months down the road).

这就是熊猫0.19.0中当前行为的changed。引用"what's new" page

Following Series operators have been changed to make all operators consistent, including DataFrame (GH1134, GH4581, GH13538)

  • Series comparison operators now raise ValueError when index are different.
  • Series logical operators align both index of left and right hand side.

这使得序列行为与DataFrame的行为相匹配,DataFrame已经在比较中拒绝了不匹配的索引。在

总之,让比较运算符自动对齐索引会破坏太多内容,所以这是最好的选择。在

相关问题 更多 >