我有两个df
df1
date League teams
0 201902272215 brazil cup foz do iguacu fcceara ce
1 201902272300 colombia primera a deportes tolimaatletico bucaramanga
2 201902272300 brazil campeonato gaucho 2nd division ypiranga rsuniao frederiquense
3 201902272300 brazil campeonato gaucho 2nd division esportivo rstupi rs
4 201902272300 brazil campeonato gaucho 2nd division sao paulo rsgremio esportivo bage
14 201902280000 four nations women tournament (in usa) usa (w)japan (w)
25 201902280030 bolivia professional football league real potosibolivar
df2
date league teams
0 201902280000 womens international usa womenjapan women
1 201902280000 brazil amazonense sul america ecrio negro am
2 201902280030 bolivia apertura real potosibolivar
3 201902280030 brazil campeonato paulista palmeirasituano
4 201902280030 copa sudamericana racing clubcorinthians
我想要的结果是df2中与df1接近匹配的所有行
date league teams near_match
0 201902280000 womens international usa womenjapan women 1
1 201902280000 brazil amazonense sul america ecrio negro am 0
2 201902280030 bolivia apertura real potosibolivar 1
3 201902280030 brazil campeonato paulista palmeirasituano 0
4 201902280030 copa sudamericana racing clubcorinthians 0
我尝试使用for循环的一个变体,使用SequenceMatcher
并将阈值设置为大于0.8的匹配,但没有任何运气
df_1['merge_teams'] = df_1['teams'] # we will use these as the merge keys
df_1['merge_date'] = df_1['date']
# df_1['merge_league'] = df_1['league']
for teams_1, date_1, league_1 in df_1[['teams','date']].values:
for ixb, (teams_1, teams_2) in enumerate(df_2[['teams','date']].values):
if difflib.SequenceMatcher(None,teams_1,teams_2).ratio() > .8:
df_2.ix[ixb,'merge_teams'] = teams_1 # creates a merge key in df_2
if difflib.SequenceMatcher(None,date_1, date_2).ratio() > .8:
df_2.ix[ixb,'merge_date'] = date_1 # creates a merge key in df_2
# This should rturn all rows where teams,date and league all match by over 80%
# This is just for teams and date, I want to include league as well
如有任何建议或指导,将不胜感激
目前没有回答
相关问题 更多 >
编程相关推荐