从透视 DataFrame 查询非 NaN 值

1 投票

3 回答

37 浏览

提问于 2025-04-12 20:28

我有一个透视表（pivoted df）：

data = np.column_stack([["alpha", "beta", "gamma", "delta"], ["a", "b", "c", "d"], [0, 1, 2, 3], othercol, othercol2])
df = pn.DataFrame(data, columns=["greek", "latin", "distance", "othercol", "othercol2"])
piv = df.pivot(index = "greek", columns="latin", values="values")

我想通过名字来访问这个透视表 piv 的值，所以我想用 .loc。用 piv.loc["gamma", "c"] 这样访问是没问题的——但是如果我想在一个循环里，随机组合希腊字母和拉丁字母的列名来访问 piv 呢？那样的话，有可能会有一个组合返回的是 NaN（空值）。--

换句话说，有没有办法让 .loc 获取某个行列组合的非 NaN 值呢？

编辑：感谢 @mozway 的详细解释！这里是更完整的代码版本：

def _get_distance_df():
    for df in list_of_dfs:
        rows = []
        for (a, b), (id_a, id_b) in zip(
            itertools.combinations(df.obj_to_calc_dist_on, 2),
            itertools.combinations(df.id, 2),
        ):
            dist = get_distance(a, b)
            row = [
                df.formula.iloc[0],
                df.description.iloc[0],
                a,
                b,
                dist,
                id_a,
                id_b,
            ]
            rows.append(row)
        newdf = pn.DataFrame(rows, columns=distance_df.columns)
        distance_df = pn.concat([distance_df, newdf], ignore_index=True)
    reduced_distance_df = distance_df[["id1", "id2", "distance"]]
    distance_df = distance_df
    piv_distance_df = distance_df.pivot(
        index="id1", columns="id2", values="distance"
    )

我把距离的表格透视了一下，想让值更容易访问，希望能用 .loc 在下一个函数中查询：

def main_logic():
    for df in self.list_of_dfs:
            id1 = df.iloc[0]["id"]
            id2 = df.iloc[1]["id"]
            try:
                # distance = self.piv_distance_df.loc[id1, id2]
                distance = self.reduced_distance_df.loc[(id1, id2)]
            except KeyError:
                distance = self.reduced_distance_df.loc[(id2, id1)]
                # distance = self.piv_distance_df.loc[id2, id1]
            print(distance)

但后来我意识到其实不需要透视，因为 reduced_distance_df 也能很容易地访问。我觉得这种处理距离计算再查询回去获取 ID 的方式有点笨拙，但到目前为止我想不出更好的方法。

数据处理数据查询数据框距离计算数据透视非 NaN 值列名组合循环访问

3 个回答

如果我理解得没错，这段代码应该可以做到这一点。

首先，创建你想要的一些组合列表。
然后，遍历这个列表，使用 .loc。
添加一个 if 语句，用来检查结果是否为 None，如果不是，就返回这个结果。
如果是 Nan，就打印出来，或者把它放到一个列表里，随你怎么处理。

# Example iteration through combinations
combinations = [("gamma", "c"), ("alpha", "d")]  # Example you can add more combinations

for greek, latin in combinations:
    #check if the combination exists
    if pd.notna(piv.loc[greek, latin]):
        print(f"Value at {greek}, {latin}: {piv.loc[greek, latin]}")
    # or add to a list notnan.append((greek, latin))
    else:
        print(f"Combination {greek}, {latin} is NaN.")
    # Same here you can add it to a list or whatever

回答于 2025-04-12 由 Python大师

分享举报

换句话说，有没有办法让 .loc 获取某一行和某一列组合下的非空值？

我觉得在透视的数据表中是做不到的，但如果通过 DataFrame.stack 来重新排列数据，就可以去掉缺失值，并且可以通过元组在 MultiIndex Series 中进行选择：

out = piv.stack()
#last pandas version
#out = piv.stack(future_stack=False)


print (out.loc[('alpha','a')])
0

所有组合的列表返回非空值：

print (out.index.tolist())
[('alpha', 'a'), ('beta', 'b'), ('delta', 'd'), ('gamma', 'c')]

对于随机值：

import random
rn = random.sample(out.index.tolist(), 2)
print (rn)
[('delta', 'd'), ('alpha', 'a')]

回答于 2025-04-12 由 Python大师

分享举报

如果你想避免出现NaN（缺失值），可以使用stack这个方法，这样就能把所有的NaN都去掉：

tmp = piv.stack()

输出结果：

greek  latin
alpha  a        0
beta   b        1
delta  d        3
gamma  c        2
dtype: object

然后你可以直接进行切片操作：

tmp.loc[('alpha', 'a')]

或者，如果你想处理可能缺失的组合，可以这样做：

tmp.get(('alpha', 'a'), 'missing')

输出结果：0

需要注意的是，如果你想随机获取一个项目，其实不需要知道具体的索引，只需使用sample方法：

tmp = piv.stack()
chosen = tmp.sample(1)
chosen.index[0]
# ('beta', 'b')

chosen.squeeze()
# 1

或者一次获取多个值：

tmp.sample(5, replace=True)

输出结果：

greek  latin
delta  d        3
alpha  a        0
       a        0
gamma  c        2
beta   b        1
dtype: object

如果你有任意顺序的配对，最好的办法是设计你的循环，使得组合能够按正确的顺序提供。

假设你无法做到这一点，你可以使用try和except来处理：

idx, col = 'a', 'alpha'
try:
    piv.loc[idx, col]
except KeyError:
    piv.loc[col, idx]

另外，你也可以再次使用stack，并将你的索引设置为一个不可变集合（frozenset）：

idx, col = 'a', 'alpha'

tmp = piv.stack()
tmp.index = tmp.index.map(frozenset)
tmp.get(frozenset((idx, col)), None)

输出结果：0

回答于 2025-04-12 由 Python大师

分享举报

从透视 DataFrame 查询非 NaN 值

3 个回答

撰写回答