如何基于其中一列的子字符串合并数据帧?

2024-06-13 03:36:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧:df1和df2

df1
                  School Conference
0              Air Force   Mt. West
1                  Akron        MAC
2  Alabama at Birmingham      C-USA
3                 Auburn   Sun Belt

df2
                           SCHOOL_NAME           RATE
0                    Auburn University           93.0
1                    Air Force Academy           53.0
2                           Birmingham           75.0
3                  University of Akron           77.0

I would like to get the output below, basically binding the `RATE` column from df2 into df1 based on substring from School column
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

我尝试了下面的代码,但它不起作用。当我运行它时,它似乎成功地执行了,但什么也没有发生

for i in range(1, len(df1)):
    if df1['School'][i] in df2['SCHOOL_NAME']:
       pd.merge(df1, df2, how = 'left', left_on = 'School', right_on = 'SCHOOL_NAME')

Tags: namerateonairdf1df2westmt
1条回答
网友
1楼 · 发布于 2024-06-13 03:36:44

您可以使用列表理解来检查每个数据帧中的列是否相互in(您也可以不区分大小写进行比较),然后合并:

df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
                                                    if x in y or y in x]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[1]: 
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

您还可以通过将.lower()添加到xy来不敏感地搜索大小写:

df1['SCHOOL_NAME'] = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
                                                    if x.lower() in y.lower()
                                                    or y.lower() in x.lower()]).str[0]
df1 = df1.merge(df2, how='left').drop('SCHOOL_NAME', axis=1) #can pass on='SCHOOL_NAME' to merge.
df1
Out[2]:
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

每条注释的单行代码:

df1 = (df1.assign(SCHOOL_NAME = df1['School'].apply(lambda x: [y for y in df2['SCHOOL_NAME']
                                                    if x.lower() in y.lower()
                                                    or y.lower() in x.lower()]).str[0])
          .merge(df2, how='left').drop('SCHOOL_NAME', axis=1))
df1
Out[3]: 
                  School Conference  RATE
0              Air Force   Mt. West  53.0
1                  Akron        MAC  77.0
2  Alabama at Birmingham      C-USA  75.0
3                 Auburn   Sun Belt  93.0

相关问题 更多 >