从数据框中选择包含特定值的行问题的回答

从数据框中选择包含特定值的行

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>导言</h2> 在选择行的核心，我们需要一个1D掩码或一系列长度与<code>df</code>长度相同的布尔元素，我们称之为<code>mask</code>。因此，最后使用<code>df[mask]</code>，我们将在<a href="https://numpy.org/doc/stable/user/basics.indexing.html#boolean-or-mask-index-arrays" rel="noreferrer">boolean-indexing</a>之后的<code>df</code>中获得所选行 这是我们的开始<code>df</code>： <pre><code>In [42]: df Out[42]: A B C 1 apple banana pear 2 pear pear apple 3 banana pear pear 4 apple apple pear </code></pre> <h3>I.匹配一个字符串</h3> 现在，如果我们只需要匹配一个字符串，它是向前的，元素相等： <pre><code>In [42]: df == 'banana' Out[42]: A B C 1 False True False 2 False False False 3 True False False 4 False False False </code></pre> 如果我们需要在每行中查找<code>ANY</code>一个匹配项，请使用<code>.any</code>方法： <pre><code>In [43]: (df == 'banana').any(axis=1) Out[43]: 1 True 2 False 3 True 4 False dtype: bool </code></pre> 要选择相应的行，请执行以下操作： <pre><code>In [44]: df[(df == 'banana').any(axis=1)] Out[44]: A B C 1 apple banana pear 3 banana pear pear </code></pre> <hr/> <h3>二,。匹配多个字符串</h3> 1。搜索<code>ANY</code>匹配项 这是我们的开始<code>df</code>： <pre><code>In [42]: df Out[42]: A B C 1 apple banana pear 2 pear pear apple 3 banana pear pear 4 apple apple pear </code></pre> NumPy的<a href="https://numpy.org/doc/stable/reference/generated/numpy.isin.html" rel="noreferrer">^{<cd10>}</a>将在这里工作（或者使用其他帖子中列出的pandas.isin）从^{<cd1>中的搜索字符串列表中获取所有匹配项。所以，假设我们在{<cd1>}中寻找{<cd12>}或{<cd13>}： <pre><code>In [51]: np.isin(df, ['pear','apple']) Out[51]: array([[ True, False, True], [ True, True, True], [False, True, True], [ True, True, True]]) # ANY match along each row In [52]: np.isin(df, ['pear','apple']).any(axis=1) Out[52]: array([ True, True, True, True]) # Select corresponding rows with masking In [56]: df[np.isin(df, ['pear','apple']).any(axis=1)] Out[56]: A B C 1 apple banana pear 2 pear pear apple 3 banana pear pear 4 apple apple pear </code></pre> 2。搜索<code>ALL</code>匹配项 这是我们的开始<code>df</code>： <pre><code>In [42]: df Out[42]: A B C 1 apple banana pear 2 pear pear apple 3 banana pear pear 4 apple apple pear </code></pre> 因此，现在我们正在寻找具有<code>BOTH</code>的行，比如<code>['pear','apple']</code>。我们将利用<code>NumPy-broadcasting</code>： <pre><code>In [66]: np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1) Out[66]: array([[ True, True], [ True, True], [ True, False], [ True, True]]) </code></pre> 因此，我们有一个<code>2</code>项的搜索列表，因此我们有一个带有<code>number of rows = len(df)</code>和<code>number of cols = number of search items</code>的2D掩码。因此，在上面的结果中，第一个col表示<code>'pear'</code>，第二个col表示<code>'apple'</code> 为了使事情具体化，让我们为三个项目<code>['apple','banana', 'pear']</code>设置一个掩码： <pre><code>In [62]: np.equal.outer(df.to_numpy(copy=False), ['apple','banana', 'pear']).any(axis=1) Out[62]: array([[ True, True, True], [ True, False, True], [False, True, True], [ True, False, True]]) </code></pre> 此掩码的列分别用于<code>'apple','banana', 'pear'</code> 回到<code>2</code>搜索项目案例，我们之前有： <pre><code>In [66]: np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1) Out[66]: array([[ True, True], [ True, True], [ True, False], [ True, True]]) </code></pre> 因为，我们在每行中查找<code>ALL</code>个匹配项： <pre><code>In [67]: np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1).all(axis=1) Out[67]: array([ True, True, False, True]) </code></pre> 最后，选择行： <pre><code>In [70]: df[np.equal.outer(df.to_numpy(copy=False), ['pear','apple']).any(axis=1).all(axis=1)] Out[70]: A B C 1 apple banana pear 2 pear pear apple 4 apple apple pear </code></pre>

从数据框中选择包含特定值的行

1 个回答

相关Python问题