合并具有相似列的两个数据帧

Item ID Order ID Name Location 21 1 John IL 22 2 John LA 24 2 Ron LA 25 3 Ron LA 29 5 John IL

Item ID Order ID Name Location Type 20 1 John IL cable 22 2 Ron LA cable 23 2 Ron LA Box 26 3 Ron LA cable 27 N/A N/A IL Box 29 5 John IL Box

Item ID Order ID Name Location Type 20 1 John IL Cable 21 4 John IL N/A 22 2 John/Ron LA Cable 23 2 Ron LA Box 24 2 Ron LA N/A 25 3 Ron LA N/A 26 3 Ron LA Cable 27 N/A N/A IL Box 28 N/A N/A N/A N/A 29 5 John IL Box

3条回答

网友

1楼 · 编辑于 2024-05-14 03:48:15

我在另一个帖子上发现了这个，做了一点小改动，它就完成了我想要的。我会张贴的人谁也需要它的定义版本。你知道吗

# combine the common columns
def merge_dfs(dfs):
df1 = dfs[0]
df2= dfs[1]

left= df1
right = df2

keyCol = 'Request ID'
commonCols = list(set(left.columns & right.columns))
finalCols = list(set(left.columns | right.columns))
#print('Common = ' + str(commonCols) + ', Final = ' + str(finalCols))

mergeDf = left.merge(right, on=keyCol, how='outer', suffixes=('_left', '_right'))


   # combine the common columns
for col in commonCols:
    if col != keyCol:
        for i, row in mergeDf.iterrows():
            leftVal = str(row[col + '_left']).replace('nan', "").strip()
            rightVal = str(row[col + '_right']).replace('nan', "").strip()
            #print(leftVal + ',' + rightVal)
            if leftVal == rightVal:
                mergeDf.loc[i, col] = leftVal
            else:
                mergeDf.loc[i, col] = leftVal + "~" + rightVal

# only use the finalCols
mergeDf = mergeDf[finalCols]
for df in dfs[2:]:
    df1 = mergeDf
    df2= df

    left= df1
    right = df2

    keyCol ='Request ID'
    commonCols = list(set(left.columns & right.columns))
    finalCols = list(set(left.columns | right.columns))
    #print('Common = ' + str(commonCols) + ', Final = ' + str(finalCols))

    mergeDf = left.merge(right, on=keyCol, how='outer', suffixes=('_left', '_right'))


       # combine the common columns
    for col in commonCols:
        if col != keyCol:
            for i, row in mergeDf.iterrows():
                leftVal = str(row[col + '_left']).replace('nan', "").strip()
                rightVal = str(row[col + '_right']).replace('nan', "").strip()
                #print(leftVal + ',' + rightVal)
                leftValWords = leftVal.split('~')
                #print(leftValWords)
                if rightVal in leftValWords:
                    mergeDf.loc[i, col] = leftVal
                else:
                    mergeDf.loc[i, col] = leftVal + '~' + rightVal

# only use the finalCols
    mergeDf = mergeDf[finalCols]
    mergeDf = mergeDf
return mergeDf

网友

2楼 · 编辑于 2024-05-14 03:48:15

这可能有用

pd.concat([df1, df2]).sort_values('Item_ID').drop_duplicates(['Item_ID'], keep='last')

   Item_ID Location  Name  Order_ID   Type
0       20       IL  John       1.0  cable
0       21       IL  John       1.0    NaN
1       22       LA   Ron       2.0  cable
2       23       LA   Ron       2.0    Box
2       24       LA   Ron       2.0    NaN
3       25       LA   Ron       3.0    NaN
3       26       LA   Ron       3.0  cable
4       27       IL   NaN       NaN    Box
5       29       IL  John       5.0    Box

网友

3楼 · 编辑于 2024-05-14 03:48:15

如果要填充缺少的值，另一种方法是使用^{}和^{}：

l=pd.concat((df1['Item ID'],df2['Item ID']))
final=(df1.set_index('Item ID').reindex(range(l.min(),l.max()+1))
    .combine_first(df2.set_index('Item ID')).reset_index().reindex(columns=df2.columns))

   Item ID  Order ID  Name Location   Type
0       20       1.0  John       IL  cable
1       21       1.0  John       IL    NaN
2       22       2.0   Ron       LA  cable
3       23       2.0   Ron       LA    Box
4       24       2.0   Ron       LA    NaN
5       25       3.0   Ron       LA    NaN
6       26       3.0   Ron       LA  cable
7       27       NaN   NaN       IL    Box
8       28       NaN   NaN      NaN    NaN
9       29       5.0  John       IL    Box

相关问题更多 >

编程相关推荐

热门问题

热门文章