获取符合条件的Pandas DataFrame的行列索引对

3 投票
1 回答
2649 浏览
提问于 2025-04-17 20:54

假设我有一个Pandas的DataFrame,它看起来像下面这样。这些值是基于一个距离矩阵的。

A = pd.DataFrame([(1.0,0.8,0.6708203932499369,0.6761234037828132,0.7302967433402214),
                  (0.8,1.0,0.6708203932499369,0.8451542547285166,0.9128709291752769),
        (0.6708203932499369,0.6708203932499369,1.0,0.5669467095138409,0.6123724356957946),
        (0.6761234037828132,0.8451542547285166,0.5669467095138409,1.0,0.9258200997725514),
        (0.7302967433402214,0.9128709291752769,0.6123724356957946,0.9258200997725514,1.0)
                  ])

输出:

Out[65]: 
          0         1         2         3         4
0  1.000000  0.800000  0.670820  0.676123  0.730297
1  0.800000  1.000000  0.670820  0.845154  0.912871
2  0.670820  0.670820  1.000000  0.566947  0.612372
3  0.676123  0.845154  0.566947  1.000000  0.925820
4  0.730297  0.912871  0.612372  0.925820  1.000000

我只想要上三角部分。

c2 = A.copy()
c2.values[np.tril_indices_from(c2)] = np.nan

输出:

Out[67]: 

        0    1        2         3         4
    0 NaN  0.8  0.67082  0.676123  0.730297
    1 NaN  NaN  0.67082  0.845154  0.912871
    2 NaN  NaN      NaN  0.566947  0.612372
    3 NaN  NaN      NaN       NaN  0.925820
    4 NaN  NaN      NaN       NaN       NaN

现在我想根据一些条件获取行和列的索引对。比如:获取值大于0.8的行和列索引。对于这个条件,输出应该是[1,3],[1,4],[3,4]。有什么帮助吗?

1 个回答

4

你可以使用numpy的 argwhere 函数:

In [11]: np.argwhere(c2 > 0.8)
Out[11]: 
array([[1, 3],
       [1, 4],
       [3, 4]])

如果你想得到索引或列名(而不是它们的整数位置),可以使用列表推导式:

[(c2.index[i], c2.columns[j]) for i, j in np.argwhere(c2 > 0.8)]

撰写回答