用python中对应的名称替换数千行ID名称的最佳方法是什么？

UniProtID NAME Q15173 PPP2R5B P30154 PPP2R1B P63151 PPP2R2A DrugBankID Name Type UniProtID UniProt Name DB00001 Lepirudin BiotechDrug P00734 Prothrombin DB00002 Cetuximab BiotechDrug P00533 Epidermal growth factor receptor DB00002 Cetuximab BiotechDrug O75015 Low affinity immunoglobulin gamma Fc region receptor III-B

2条回答

网友

1楼 · 编辑于 2024-05-23 22:45:58

我认为您需要由^{}创建的^{}，如果某些值与get NaN不匹配：

#change data for match
print (df1)
  UniProtID     NAME
0    O75015  PPP2R5B
1    P00734  PPP2R1B
2    P63151  PPP2R2A

df2['UniProt Name'] = df2['UniProtID'].map(df1.set_index('UniProtID')['NAME'])
print (df2)
  DrugBankID       Name         Type UniProtID UniProt Name
0    DB00001  Lepirudin  BiotechDrug    P00734      PPP2R1B
1    DB00002  Cetuximab  BiotechDrug    P00533          NaN
2    DB00002  Cetuximab  BiotechDrug    O75015      PPP2R5B

如果NaN需要原始值：

df2['UniProt Name'] = df2['UniProtID'].map(df1.set_index('UniProtID')['NAME'])
                                      .fillna(df2['UniProt Name'])
print (df2)
  DrugBankID       Name         Type UniProtID  \
0    DB00001  Lepirudin  BiotechDrug    P00734   
1    DB00002  Cetuximab  BiotechDrug    P00533   
2    DB00002  Cetuximab  BiotechDrug    O75015   

                       UniProt Name  
0                           PPP2R1B  
1  Epidermal growth factor receptor  
2                           PPP2R5B

与^{}-需要left与^{}或^{}连接的解决方案，最后通过^{}删除列：

df = pd.merge(df2, df1, on="UniProtID", how='left')
df['UniProt Name'] = df['NAME'].fillna(df['UniProt Name'])
#alternative
#df['UniProt Name'] = df['NAME'].combine_first(df['UniProt Name'])
df.drop('NAME', axis=1, inplace=True)
print (df)
  DrugBankID       Name         Type UniProtID  \
0    DB00001  Lepirudin  BiotechDrug    P00734   
1    DB00002  Cetuximab  BiotechDrug    P00533   
2    DB00002  Cetuximab  BiotechDrug    O75015   

                       UniProt Name  
0                           PPP2R1B  
1  Epidermal growth factor receptor  
2                           PPP2R5B

df = pd.merge(df2, df1, on="UniProtID", how='left')
df = df.drop('UniProt Name', axis=1).rename(columns={'NAME':'UniProt Name'})
print (df)
  DrugBankID       Name         Type UniProtID UniProt Name
0    DB00001  Lepirudin  BiotechDrug    P00734      PPP2R1B
1    DB00002  Cetuximab  BiotechDrug    P00533          NaN
2    DB00002  Cetuximab  BiotechDrug    O75015      PPP2R5B

网友

2楼 · 编辑于 2024-05-23 22:45:58

解决这个问题的更一般的方法是对两个表执行类似SQL的连接。你知道吗

注意：对于较大的数据集，这可能会很昂贵，我还没有尝试过性能。你知道吗

import pandas as pd

left = pd.DataFrame({"UniProtID": ["Q15173", "P30154", "P63151"],
                     "Name": ["PPP2R5B", "PPP2R1B", "PPP2R2A"]})

right = pd.DataFrame({"UniProtID": ["Q15173", "P30154", "P63151"],
                      "UniProt Name": ["Prothrombin", "Epidermal growth factor receptor", "Low affinity immunoglobulin gamma Fc region receptor III-B"],
                      "Type": ["BiotechDrug", "BiotechDrug", "BiotechDrug"],
                      "DrugBankID": ["DB00001", "DB00002", "DB00003"]})

result = pd.merge(left, right, on="UniProtID")

引用：https://pandas.pydata.org/pandas-docs/stable/merging.html#overlapping-value-columns

相关问题更多 >

编程相关推荐

热门问题

热门文章