Pandas相当于R's which（）

网友

1楼 · 编辑于 2024-04-19 13:02:16

我可能不太明白这个问题，但似乎回答起来比你想的要容易：

使用pandas数据帧：

df['colname'] > somenumberIchoose

返回具有真/假值和数据帧原始索引的pandas系列。

然后，可以在原始数据帧上使用该布尔序列，并获取要查找的子集：

df[df['colname'] > somenumberIchoose]

应该够了。

网友

2楼 · 编辑于 2024-04-19 13:02:16

据我所知，使用numpy——一个类似于MATLAB的科学计算软件包，您可能会更舒服一些。

如果您希望数组的索引值可以被2整除，那么下面的方法可以工作。

arr = numpy.arange(10)
truth_table = arr % 2 == 0
indices = numpy.where(truth_table)
values = arr[indices]

使用多维数组也很容易

arr2d = arr.reshape(2,5)
col_indices = numpy.where(arr2d[col_index] % 2 == 0)
col_values = arr2d[col_index, col_indices]

网友

3楼 · 编辑于 2024-04-19 13:02:16

enumerate()返回一个迭代器，该迭代器在每次迭代中生成一个(index, item)元组，因此您不能（也不需要）再次调用.index()。

此外，您的列表理解语法错误：

indexfuture = [(index, x) for (index, x) in enumerate(df['colname']) if x > yesterday]

测试用例：

>>> [(index, x) for (index, x) in enumerate("abcdef") if x > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]

当然，你不需要打开元组：

>>> [tup for tup in enumerate("abcdef") if tup[1] > "c"]
[(3, 'd'), (4, 'e'), (5, 'f')]

除非你只对指数感兴趣，在这种情况下你可以做

>>> [index for (index, x) in enumerate("abcdef") if x > "c"]
[3, 4, 5]