在数据框中查找值并在相应列中交叉引用值

Index X_1 X_2 X_3 W_1 W_2 W_3 **W** 1 IEZ XOP ABC 0.42 0.18 0.40 **0.40** 2 PXJ ABC XES 0.47 0.12 0.41 **0.12** 3 ABC RYE PXE 0.23 0.33 0.44 **0.23** 4 XOP IEZ ABC 0.62 0.20 0.18 **0.18**

3条回答

网友

1楼 · 编辑于 2024-05-23 18:31:37

import numpy as np
import pandas as pd

# df is your dataframe

# idxs = np.argwhere(df.values == "ABC") will also work
# if "ABC" only appears once per row.
idxs = np.argwhere(df.values[:, :3] == "ABC")
idxs[:, 1] += 3
w = df.values[idxs[:, 0], idxs[:, 1]]
df = df.assign(W=w)

或者

matches = df.iloc[:, :3] == "ABC"
w = df.iloc[:, 3:].values[matches]
df = df.assign(W=w)

网友

2楼 · 编辑于 2024-05-23 18:31:37

另一种方法：

df = pd.DataFrame({'X_1' : ['IEZ', 'PXJ', 'ABC', 'XOP'],  
                   'X_2' : ['XOP', 'ABC', 'RYE', 'IEZ'], 
                   'X_3' : ['ABC', 'XES','PXE', 'ABC'],
                   'W_1' :  [0.42, 0.47, 0.23, 0.62],
                   'W_2' : [0.18, 0.12, 0.33, 0.20],
                   'W_3' :  [0.40, 0.41, 0.44, 0.18]})

首先，取数字列：

num_columns = df.loc[:,'W_1':'W_3']

接下来，使用X_1->；X_3列生成布尔掩码：

df_mask = (df.loc[:,'X_1':'X_3']=='ABC').values

最后，使用DataFrame mask方法，当单元为真时返回NaNs，当掩码为假时返回单元值。然后，我们将对结果行求和，并将其分配给原始数据帧：

df['W'] = num_columns.mask(~df_mask).sum(axis=1)

当然，这可以组合成一行：

df['W'] = (df.loc[:,'W_1':'W_3']
            .mask(~(df.loc[:,'X_1':'X_3']=='ABC').values)
            .sum(axis=1))

编辑：

当然，这只在每行只有一个'ABC'实例的情况下才有效-您可能需要对此进行检查。你知道吗

网友

3楼 · 编辑于 2024-05-23 18:31:37

很有趣。我相信有更好的办法，但是：

x_cols = [x for x in df.columns if x.startswith('X_')]
res_dfs = []
for col in x_cols:
    idx = col.split("_")[1]
    xw = df[col, "W_{idx}]
    xw = xw.loc[xw[col]  == 'ABC']
    xw = xw[[f"W_{idx}"]].rename(columns={f"W_{idx}": 'W'})
    res = df.join(xw).dropna()
    res_dfs.append(res)
df = pd.concat(res_dfs)

基本上我迭代x列和它们匹配的w列，找到x值为'abc'的地方，并用匹配的w值填充一个新的'w'列。你知道吗

它是从我的手机，所以我不能尝试它，但这是一般的想法。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章