<p>不是真正的“正则表达式”问题;您应该考虑字符串的模糊比较,即Levenshtein distance或diff</p>
<p>见<a href="https://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison">https://stackoverflow.com/questions/682367/good-python-modules-for-fuzzy-string-comparison</a></p>
<p><strong>编辑:</strong>一些示例代码:</p>
<pre><code>import Levenshtein
base_strings = [
"R Deep Transverse Metatarsal Ligament 4 GEODE",
"R Distal JointCapsule 1 GEODE",
"R Dorsal Calcaneocuboid Ligament GEODE",
"R Dorsal Carpometacarpal Ligament 2 GEODE",
"R Dorsal Cuboideavicular Ligament GEODE",
"R Dorsal Tarsometatarsal Ligament 5 GEODE",
"R Elbow Capsule GEODE",
"R F Distal JointCapsule 1 GEODE",
"R Fibular Collateral Bursa GEODE",
"R Fibular Collateral Ligament GEODE",
"R Fibular Ligament GEODE"
]
def main():
print("Medical term matcher:")
while True:
t = raw_input('Match what? ').strip()
if len(t):
print("Best match: {}".format(sorted(base_strings, key = lambda x: Levenshtein.ratio(x, t), reverse=True)[0]))
else:
break
if __name__=="__main__":
main()
</code></pre>
<p>实际产量:</p>
^{pr2}$
<p><strong>编辑2:</strong>“如果有多个答案,它应该显示全部”—基本字符串是不同程度的<em>所有</em>答案<em>。那么,问题是,你想要使用什么样的相似度临界值;也许是“所有答案至少是最佳匹配的90%”之类的东西?在</p>