Pandas布尔索引的逻辑运算符问题的回答

Pandas布尔索引的逻辑运算符

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>熊猫的TLDR；逻辑运算符是<code>&</code>、<code>|</code>和<code>~</code>，括号<code>(...)</code>很重要！</h2> Python的<code>and</code>、<code>or</code>和<code>not</code>逻辑运算符被设计为与标量一起工作。因此，Pandas必须做得更好，并重写按位运算符，以实现此功能的矢量化版本。 因此，python中的以下表达式（<code>exp1</code>和<code>exp2</code>是求值为布尔结果的表达式）。。。 <pre><code>exp1 and exp2 # Logical AND exp1 or exp2 # Logical OR not exp1 # Logical NOT </code></pre> …将转换为。。。 <pre><code>exp1 & exp2 # Element-wise logical AND exp1 | exp2 # Element-wise logical OR ~exp1 # Element-wise logical NOT </code></pre> 为了熊猫。 如果在执行逻辑操作的过程中得到<code>ValueError</code>，则需要使用括号进行分组： <pre><code>(exp1) op (exp2) </code></pre> 例如 <pre><code>(df['col1'] == x) & (df['col2'] == y) </code></pre> 等等。 <hr/> <a href="https://pandas-docs.github.io/pandas-docs-travis/user_guide/indexing.html#boolean-indexing" rel="noreferrer">Boolean Indexing</a>：一个常见的操作是通过逻辑条件计算布尔掩码来过滤数据。Pandas提供三个运算符：逻辑与的<code>&</code>，逻辑或的<code>|</code>，逻辑非的<code>~</code>。 考虑以下设置： <pre><code>np.random.seed(0) df = pd.DataFrame(np.random.choice(10, (5, 3)), columns=list('ABC')) df A B C 0 5 0 3 1 3 7 9 2 3 5 2 3 4 7 6 4 8 8 1 </code></pre> <h2>逻辑和</h2> 对于上面的<code>df</code>，假设您希望返回A&lt；5和B&gt；5所在的所有行。这是通过分别计算每个条件的掩码，并对它们进行运算来实现的。 按位重载<code>&</code>运算符 在继续之前，请注意文档的这一特定摘录，其中说明 <blockquote> Another common operation is the use of boolean vectors to filter the data. The operators are: <code>|</code> for <code>or</code>, <code>&</code> for <code>and</code>, and <code>~</code> for <code>not</code>. These must be grouped by using parentheses, since by default Python will evaluate an expression such as <code>df.A > 2 & df.B < 3</code> as <code>df.A > (2 & df.B) < 3</code>, while the desired evaluation order is <code>(df.A > 2) & (df.B < 3)</code>. </blockquote> 因此，考虑到这一点，可以使用按位运算符<code>&</code>实现元素逻辑和： <pre><code>df['A'] < 5 0 False 1 True 2 True 3 True 4 False Name: A, dtype: bool df['B'] > 5 0 False 1 True 2 False 3 True 4 True Name: B, dtype: bool </code></pre> <pre><code>(df['A'] < 5) & (df['B'] > 5) 0 False 1 True 2 False 3 True 4 False dtype: bool </code></pre> 接下来的过滤步骤很简单 <pre><code>df[(df['A'] < 5) & (df['B'] > 5)] A B C 1 3 7 9 3 4 7 6 </code></pre> 括号用于重写按位运算符的默认优先级顺序，这些运算符的优先级高于条件运算符<code><</code>和<code>></code>。请参见python文档中的<a href="https://docs.python.org/3/reference/expressions.html#operator-precedence" rel="noreferrer">Operator Precedence</a>部分。 如果不使用括号，则表达式的计算结果不正确。例如，如果您不小心尝试了 <pre><code>df['A'] < 5 & df['B'] > 5 </code></pre> 它被解析为 <pre><code>df['A'] < (5 & df['B']) > 5 </code></pre> 变成了 <pre><code>df['A'] < something_you_dont_want > 5 </code></pre> 它变成（参见<a href="https://docs.python.org/3/reference/expressions.html#comparisons" rel="noreferrer">chained operator comparison</a>上的python文档） <pre><code>(df['A'] < something_you_dont_want) and (something_you_dont_want > 5) </code></pre> 变成了 <pre><code># Both operands are Series... something_else_you_dont_want1 and something_else_you_dont_want2</code></pre> 它抛出 <pre><code>ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). </code></pre> 所以，不要犯那个错误！1 避免括号分组 解决方法其实很简单。大多数运算符对数据帧都有相应的绑定方法。如果单个掩码是使用函数而不是条件运算符构建的，则不再需要按parens分组来指定求值顺序： <pre><code>df['A'].lt(5) 0 True 1 True 2 True 3 True 4 False Name: A, dtype: bool df['B'].gt(5) 0 False 1 True 2 False 3 True 4 True Name: B, dtype: bool </code></pre> <pre><code>df['A'].lt(5) & df['B'].gt(5) 0 False 1 True 2 False 3 True 4 False dtype: bool </code></pre> 请参阅<a href="https://pandas.pydata.org/pandas-docs/stable/basics.html#flexible-comparisons" rel="noreferrer">Flexible Comparisons.</a>部分。总而言之，我们有 <pre><code>╒════╤════════════╤════════════╕ │ │ Operator │ Function │ ╞════╪════════════╪════════════╡ │ 0 │ > │ gt │ ├────┼────────────┼────────────┤ │ 1 │ >= │ ge │ ├────┼────────────┼────────────┤ │ 2 │ < │ lt │ ├────┼────────────┼────────────┤ │ 3 │ <= │ le │ ├────┼────────────┼────────────┤ │ 4 │ == │ eq │ ├────┼────────────┼────────────┤ │ 5 │ != │ ne │ ╘════╧════════════╧════════════╛ </code></pre> 避免括号的另一个选项是使用<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html" rel="noreferrer">^{<cd19>}</a>（或<code>eval</code>）： <pre><code>df.query('A < 5 and B > 5') A B C 1 3 7 9 3 4 7 6 </code></pre> 我在<a href="https://stackoverflow.com/q/53779986/4909087">Dynamic Expression Evaluation in pandas using pd.eval()</a>中广泛地记录了<code>query</code>和<code>eval</code>。 <a href="https://docs.python.org/3/library/operator.html#operator.and_" rel="noreferrer">^{<cd23>}</a> 允许您以功能方式执行此操作。内部调用对应于按位运算符的<code>Series.__and__</code>。 <pre><code>import operator operator.and_(df['A'] < 5, df['B'] > 5) # Same as, # (df['A'] < 5).__and__(df['B'] > 5) 0 False 1 True 2 False 3 True 4 False dtype: bool df[operator.and_(df['A'] < 5, df['B'] > 5)] A B C 1 3 7 9 3 4 7 6 </code></pre> 你通常不需要这个，但知道它是有用的。 泛化：<a href="https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.logical_and.html" rel="noreferrer">^{<cd25>}</a>（和<code>logical_and.reduce</code>） 另一种方法是使用<code>np.logical_and</code>，它也不需要括号分组： <pre><code>np.logical_and(df['A'] < 5, df['B'] > 5) 0 False 1 True 2 False 3 True 4 False Name: A, dtype: bool df[np.logical_and(df['A'] < 5, df['B'] > 5)] A B C 1 3 7 9 3 4 7 6 </code></pre> <code>np.logical_and</code>是一个<a href="https://docs.scipy.org/doc/numpy-1.15.1/reference/ufuncs.html" rel="noreferrer">ufunc (Universal Functions)</a>，大多数ufunc都有一个<a href="https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.ufunc.reduce.html" rel="noreferrer">^{<cd29>}</a>方法。这意味着，如果有多个到和的掩码，则更容易使用<code>logical_and</code>进行泛化。例如，使用<code>&</code>来屏蔽<code>m1</code>、<code>m2</code>和<code>m3</code>，则必须执行以下操作 <pre><code>m1 & m2 & m3 </code></pre> 不过，更简单的选择是 <pre><code>np.logical_and.reduce([m1, m2, m3]) </code></pre> 这很强大，因为它可以让您在这个基础上构建更复杂的逻辑（例如，在列表理解中动态生成掩码并添加所有掩码）： <pre><code>import operator cols = ['A', 'B'] ops = [np.less, np.greater] values = [5, 5] m = np.logical_and.reduce([op(df[c], v) for op, c, v in zip(ops, cols, values)]) m # array([False, True, False, True, False]) df[m] A B C 1 3 7 9 3 4 7 6 </code></pre> 我知道我在唠叨这一点，但请容忍我。这是一个很常见的初学者错误，必须解释得非常透彻。 <hr/> <h2>逻辑或</h2> 对于上面的<code>df</code>，假设您希望返回A==3或B==7的所有行。 按位重载<code>|</code> <pre><code>df['A'] == 3 0 False 1 True 2 True 3 False 4 False Name: A, dtype: bool df['B'] == 7 0 False 1 True 2 False 3 True 4 False Name: B, dtype: bool </code></pre> <pre><code>(df['A'] == 3) | (df['B'] == 7) 0 False 1 True 2 True 3 True 4 False dtype: bool df[(df['A'] == 3) | (df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 </code></pre> 如果您还没有，请阅读上面关于逻辑和的部分，此处适用所有警告。 或者，此操作可以指定为 <pre><code>df[df['A'].eq(3) | df['B'].eq(7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 </code></pre> <a href="https://docs.python.org/3/library/operator.html#operator.or_" rel="noreferrer">^{<cd37>}</a> 在引擎盖下调用<code>Series.__or__</code>。 <pre><code>operator.or_(df['A'] == 3, df['B'] == 7) # Same as, # (df['A'] == 3).__or__(df['B'] == 7) 0 False 1 True 2 True 3 True 4 False dtype: bool df[operator.or_(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 </code></pre> <a href="https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.logical_or.html#numpy.logical_or" rel="noreferrer">^{<cd39>}</a> 对于两种情况，使用<code>logical_or</code>： <pre><code>np.logical_or(df['A'] == 3, df['B'] == 7) 0 False 1 True 2 True 3 True 4 False Name: A, dtype: bool df[np.logical_or(df['A'] == 3, df['B'] == 7)] A B C 1 3 7 9 2 3 5 2 3 4 7 6 </code></pre> 对于多个遮罩，使用<code>logical_or.reduce</code>： <pre><code>np.logical_or.reduce([df['A'] == 3, df['B'] == 7]) # array([False, True, True, True, False]) df[np.logical_or.reduce([df['A'] == 3, df['B'] == 7])] A B C 1 3 7 9 2 3 5 2 3 4 7 6 </code></pre> <hr/> <h2>逻辑不</h2> 给一个面具，比如 <pre><code>mask = pd.Series([True, True, False]) </code></pre> 如果需要反转每个布尔值（以便最终结果为<code>[False, False, True]</code>），则可以使用下面的任何方法。 按位<code>~</code> <pre><code>~mask 0 False 1 False 2 True dtype: bool </code></pre> 同样，表达式需要用括号括起来。 <pre><code>~(df['A'] == 3) 0 True 1 False 2 False 3 True 4 True Name: A, dtype: bool </code></pre> 这在内部调用 <pre><code>mask.__invert__() 0 False 1 False 2 True dtype: bool </code></pre> 但不要直接使用。 <code>operator.inv</code> 内部调用序列上的<code>__invert__</code>。 <pre><code>operator.inv(mask) 0 False 1 False 2 True dtype: bool </code></pre> <a href="https://docs.scipy.org/doc/numpy-1.15.1/reference/generated/numpy.logical_not.html#numpy.logical_not" rel="noreferrer">^{<cd46>}</a> 这是核弹变种。 <pre><code>np.logical_not(mask) 0 False 1 False 2 True dtype: bool </code></pre> <hr/> 注，<code>np.logical_and</code>可以用<code>bitwise_or</code>代替<code>np.bitwise_and</code>，<code>logical_or</code>，用<code>invert</code>代替<code>logical_not</code>。

Pandas布尔索引的逻辑运算符

1 个回答

相关Python问题