对Pandas中的行和列多索引使用布尔索引问题的回答

对Pandas中的行和列多索引使用布尔索引

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

问题以粗体结尾。但首先，让我们设置一些数据： <pre><code>import numpy as np import pandas as pd from itertools import product np.random.seed(1) team_names = ['Yankees', 'Mets', 'Dodgers'] jersey_numbers = [35, 71, 84] game_numbers = [1, 2] observer_names = ['Bill', 'John', 'Ralph'] observation_types = ['Speed', 'Strength'] row_indices = list(product(team_names, jersey_numbers, game_numbers, observer_names, observation_types)) observation_values = np.random.randn(len(row_indices)) tns, jns, gns, ons, ots = zip(*row_indices) data = pd.DataFrame({'team': tns, 'jersey': jns, 'game': gns, 'observer': ons, 'obstype': ots, 'value': observation_values}) data = data.set_index(['team', 'jersey', 'game', 'observer', 'obstype']) data = data.unstack(['observer', 'obstype']) data.columns = data.columns.droplevel(0) </code></pre> 这样可以得到： <img src="https://i.stack.imgur.com/xoly8.png" alt="data"/> 我想提取这个数据帧的一个子集用于后续分析。假设我想将<code>jersey</code>编号为71的行进行切片。我真的不喜欢用<code>xs</code>来做这件事。当您通过<code>xs</code>进行横截面时，您将丢失所选列。如果我跑： ^{pr2}$ 然后我得到正确的行，但是我丢失了<code>jersey</code>列。在 <img src="https://i.stack.imgur.com/XXrEc.png" alt="xs_slice"/> 另外，<code>xs</code>似乎不是一个很好的解决方案，因为我想从<code>jersey</code>列中得到一些不同的值。我认为一个更好的解决方案是<a href="https://stackoverflow.com/questions/11941492/selecting-rows-from-a-pandas-dataframe-with-a-compound-hierarchical-index#comment15917600_11942697">here</a>： <pre><code>data[[j in [71, 84] for t, j, g in data.index]] </code></pre> <img src="https://i.stack.imgur.com/zXOJr.png" alt="boolean_slice_1"/> 你甚至可以选择球衣和球队的组合： <pre><code>data[[j in [71, 84] and t in ['Dodgers', 'Mets'] for t, j, g in data.index]] </code></pre> <img src="https://i.stack.imgur.com/h8dWg.png" alt="boolean_slice_2"/> 不错！在 所以问题是：我如何做类似的事情来选择列的子集。例如，假设我只想要代表来自Ralph的数据的列。如果不使用<code>xs</code>，我该怎么做呢？或者如果我只想要带有<code>observer in ['John', 'Ralph']</code>的列呢？再一次，我更喜欢一个保持结果中所有级别的行和列索引的解决方案……就像上面的布尔索引示例一样。在 我可以做我想做的，甚至可以合并行和列索引中的选择。但我找到的唯一解决方案是一些真正的体操： <pre><code>data[[j in [71, 84] and t in ['Dodgers', 'Mets'] for t, j, g in data.index]]\ .T[[obs in ['John', 'Ralph'] for obs, obstype in data.columns]].T </code></pre> <img src="https://i.stack.imgur.com/eS5MZ.png" alt="double_boolean_slice"/> 因此，第二个问题是：有没有更简洁的方法来完成我刚才所做的？

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

对Pandas中的行和列多索引使用布尔索引

1 个回答

相关Python问题