基于另一个数据帧的值替换数据帧的值

2024-04-27 20:32:39 发布

您现在位置:Python中文网/ 问答频道 /正文

如何基于另一个查找数据帧在一个数据帧之间合并

这是我要替换值的数据帧A:

  InfoType  IncidentType    DangerType
0   NaN          A             NaN
1   NaN          C             NaN
2   NaN          B            C
3   NaN          B            NaN

这是查找表:

    ID  ParamCode   ParamValue  ParmDesc1   ParamDesc2  SortOrder   ParamStatus
0   1   IncidentType    A       ABC            DEF          1            1
1   2   IncidentType    B       GHI            JKL          2            1
2   3   IncidentType    C       MNO            PQR          7            1
2   3   DangerType      C       STU            VWX          6            1

预期投入:

  InfoType  IncidentType    DangerType
0   NaN          ABC           NaN
1   NaN          MNO           NaN
2   NaN          GHI           STU
3   NaN          GHI           NaN

请注意ParamCode是列名,我需要将ParamDesc1替换为数据帧A中相应的列。数据帧A中的每一列都可能有NaN,我不打算删除它们。别理他们

这就是我所做的:

ntf_cols = ['InfoType','IncidentType','DangerType']
for c in ntf_cols:
    if (c in ntf.columns) & (c in param['ParamCode'].values):
        paramValue = param['ParamValue'].unique()
        for idx, pv in enumerate(paramValue):
            ntf['NewIncidentType'] = pd.np.where(ntf.IncidentType.str.contains(pv), param['ParmDesc1'].values, "whatever")

错误:

ValueError: operands could not be broadcast together with shapes (25,) (13,) ()


Tags: 数据inparamnanabcstumnoghi
2条回答

使用查找表生成dict,然后替换原始数据帧的列值。假设原始数据帧为df1,查找表为df2

...
dict_map = dict(zip(df2.ParamCode + "-" + df2.ParamValue, df2.ParmDesc1))

df1['IncidentType'] = ("IncidentType" +'-'+ df1.IncidentType).replace(dict_map)
df1['DangerType'] = ("DangerType" +'-'+ df1.DangerType).replace(dict_map)
...

编辑:Lambda的answer给了我一个想法,让我知道如何对许多要应用此逻辑模式的列执行此操作:

import pandas as pd

df1 = pd.DataFrame(dict(
    InfoType = [None, None, None, None],
    IncidentType = 'A C B B'.split(),
    DangerType = [None, None, 'C', None],
))

df2 = pd.DataFrame(dict(
    ParamCode = 'IncidentType IncidentType IncidentType DangerType'.split(),
    ParamValue  = 'A B C C'.split(),
    ParmDesc1 = 'ABC GHI MNO STU'.split(),
))


for col in df1.columns[1:]:
    dict_map = dict(
        df2[df2.ParamCode == col][['ParamValue','ParmDesc1']].to_records(index=False)
    )
    df1[col] = df1[col].replace(dict_map)

print(df1)

这假设df1中第一列之后的每一列都需要更新,并且待更新的列名作为值存在于df2'ParamCode'列中

Python tutor link to code


可以使用一些自定义函数和pandas.Series.apply()解决此问题:

import pandas as pd

def find_incident_type(x):
    if pd.isna(x):
        return x
    return df2[
        (df2['ParamCode'] == 'IncidentType') & (df2['ParamValue']==x)
    ]["ParmDesc1"].values[0]


def find_danger_type(x):
    if pd.isna(x):
        return x
    return df2[
        (df2['ParamCode'] == 'DangerType') & (df2['ParamValue']==x)
    ]["ParmDesc1"].values[0]


df1 = pd.DataFrame(dict(
    InfoType = [None, None, None, None],
    IncidentType = 'A C B B'.split(),
    DangerType = [None, None, 'C', None],
))

df2 = pd.DataFrame(dict(
    ParamCode = 'IncidentType IncidentType IncidentType DangerType'.split(),
    ParamValue  = 'A B C C'.split(),
    ParmDesc1 = 'ABC GHI MNO STU'.split(),
))

df1['IncidentType'] = df1['IncidentType'].apply(find_incident_type)
df1['DangerType'] = df1['DangerType'].apply(find_danger_type)

print(df1)

step through the code in python tutor

很可能有一种更有效的方法来做到这一点。希望知道的人能分享

此外,外部作用域中对df2的引用被硬编码到自定义函数中,因此仅适用于外部作用域中的变量名。如果不希望这些函数依赖于该引用,则需要为pandas.Series.applyargs参数使用参数

相关问题 更多 >