<p>迟答,集中一个小时,你可以用<code>difflib.SequenceMatcher</code>过滤大于<code>0.6</code>的比率,还有一大块代码。。。另外,我只需删除每个列表的最后一个单词,在它被修改后的<code>names</code>列中,得到最长的单词,它显然得到了您想要的结果,这里是。。。你知道吗</p>
<pre><code>import difflib
df2 = df.copy()
df2.loc[df2.names.str.contains('America'), 'names'] = 'US'
df2['names'] = df2.names.str.replace('.', '').str.lstrip()
df2.loc[df2.names.str.contains('REL'), 'names'] = 'Reliance'
df['group_name'] = df2.names.apply(lambda x: max(sorted([i.rsplit(None, 1)[0] for i in df2.names.tolist() if difflib.SequenceMatcher(None, x, i).ratio() > 0.6]), key=len))
print(df)
</code></pre>
<p>输出:</p>
<pre><code> names group_name
0 U.S.A. USA
1 United States of America USA
2 USA USA
3 US America USA
4 Kenyan Footbal League Kenya Football League
5 Kenyan Football League Kenya Football League
6 Kenya Football League Assoc. Kenya Football League
7 Kenya Footbal League Association Kenya Football League
8 Tata Motors Tata Motors
9 Tat Motor Tata Motors
10 Tata Motors Ltd. Tata Motors
11 Tata Motor Limited Tata Motors
12 REL Reliance
13 Reliance Limited Reliance
14 Reliance Co. Reliance
</code></pre>
<p>尽我最大努力的一个代码。你知道吗</p>