擅长:python、mysql、java
<p>您可以使用difflib内置库比较字符串之间的相似性:</p>
<pre><code>from difflib import SequenceMatcher
def get_sim_ratio(x, y):
return SequenceMatcher(None, x, y).ratio()
print(get_sim_ratio('Vascular or Circulatory Disease', 'Vascular or Circulatory Disease (CC 104-106)'))
print(get_sim_ratio('Endocrine Disease', 'Vascular or Circulatory Disease (CC 104-106)'))
</code></pre>
<p>这将输出:</p>
<pre><code>0.8266666666666667
0.36065573770491804
</code></pre>
<p>使用它的输出,您可以设置特定级别的敏感度来合并列(即,如果output>;.5->;merge)</p>