在python中使用pandas加速类似vlookup的操作

# 1. Find the soil type corresponding to the mukey tmp = sgo_df.type.values[sgo_df['mkey'] == int(row['SGO10GEO'])] if tmp.size > 0: s_type = 'ST'+tmp[0] val = int(row['VALUE']) # 2. Obtain hmu value tmp_val = merged_df[s_type].values[merged_df['GRID'] == int(row['GRID'])] if tmp_val.size > 0: hmu_val = tmp_val[0] # 4. Output into data frame: VALUE, hmu value lup_out.writerow([val,s_type,hmu_val]) else: err_out.writerow([merged_df['GRID'], type, row['GRID']])

1条回答

网友

1楼 · 发布于 2024-06-16 12:35:09

您需要在Pandas中使用merge操作以获得更好的性能。我无法测试以下代码，因为我没有数据，但至少它应该有助于您获得想法：

import pandas as pd

dbase1_df = pd.DataFrame.from_csv('dbase1_file.csv',index_col=False)
sgo_df = pd.DataFrame.from_csv('sgo_df.csv',index_col=False)
merged_df = pd.DataFrame.from_csv('merged_df.csv',index_col=False)

#you need to use the same column names for common columns to be able to do the merge operation in pandas , so we changed the column name to mkey

dbase1_df.columns = [u'VALUE', u'COUNT', u'GRID', u'mkey']

#Below operation merges the two dataframes
Step1_Merge = pd.merge(dbase1_df,sgo_df)

#We need to add a new column to concatenate ST and type
Step1_Merge['type_2'] = Step1_Merge['type'].map(lambda x: 'ST'+str(x))

# We need to change the shape of merged_df and move columns to rows to be able to do another merge operation
id = merged_df.ix[:,['GRID']]
a = pd.merge(merged_df.stack(0).reset_index(1), id, left_index=True, right_index=True)

# We also need to change the automatically generated name to type_2 to be able to do the next merge operation
a.columns = [u'type_2', 0, u'GRID']


result = pd.merge(Step1_Merge,a,on=[u'type_2',u'GRID'])

相关问题更多 >

编程相关推荐

热门问题

热门文章