在pandas datafram中排除索引行的最有效方法

import pandas as pd import numpy as np data = pd.DataFrame(np.arange(9).reshape((3, 3)), index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

2条回答

网友

1楼 · 编辑于 2024-06-12 06:36:13

这是一个健壮的解决方案，也适用于多索引对象

单一索引

excluded = ['Ohio']
indices = data.index.get_level_values('state').difference(excluded)
indx = pd.IndexSlice[indices.values]

输出

In [77]: data.loc[indx]
Out[77]:
number    one  two  three
state
Colorado    3    4      5
New York    6    7      8

多索引扩展

这里我将扩展到一个多索引示例。。。

data = pd.DataFrame(np.arange(18).reshape(6,3), index=pd.MultiIndex(levels=[[u'AU', u'UK'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1], [0, 2, 3, 0, 1, 2]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))

假设我们要从这个新的多索引的两个示例中排除'Newcastle'

excluded = ['Newcastle']
indices = data.index.get_level_values('town').difference(excluded)
indx = pd.IndexSlice[:, indices.values]

从而得到了预期的结果

In [115]: data.loc[indx, :]
Out[115]:
number              one  two  three
country town
AU      Derby         0    1      2
        Sydney        3    4      5
UK      Derby         0    1      2
        Kensington    3    4      5

常见陷阱

确保索引的所有级别都已排序，您需要data.sort_index(inplace=True)
确保包含列data.loc[indx, :]的空切片
有时候indx = pd.IndexSlice[:, indices]已经足够了，但是我发现我经常需要使用indx = pd.IndexSlice[:, indices.values]

网友

2楼 · 编辑于 2024-06-12 06:36:13

这是一个Python问题，而不是pandas问题：'state' != 'Colorado'是真的，所以pandas得到的是data.ix[[True]]。

你可以的

>>> data.loc[data.index != "Colorado"]
number    one  two  three
state                    
Ohio        0    1      2
New York    6    7      8

[2 rows x 3 columns]

或者使用^{}：

>>> data.query("state != 'New York'")
number    one  two  three
state                    
Ohio        0    1      2
Colorado    3    4      5

[2 rows x 3 columns]

如果你不喜欢复制data。（引用传递给.query()方法的表达式是唯一可以绕过这样一个事实的方法之一，否则Python将在pandas看到比较之前对其求值。）

单一索引

多索引扩展

常见陷阱

相关问题更多 >

编程相关推荐

热门问题

热门文章