字符三元模糊集。
charactertrigramfuzzyset的Python项目详细描述
基于余弦相似度的字符三元模糊集实现 模糊匹配。
这个库在字符串的iterables上做一件事任何超越 那-列文施泰因距离,得分,大逆转等等-作为一个 向读者练习
用法
importos.pathfromtimeitimporttimeitimportrequests# Retrieve a file containing around 470,000 English wordsurl='https://github.com/dwyl/english-words/raw/master/words.txt'r=requests.get(url,stream=True)words_path=os.path.expanduser('~/words.txt')ifnotos.path.isfile(words_path):withopen(words_path,'wb')asf:forchunkinr.iter_content(chunk_size=1024):ifchunk:f.write(chunk)# Usageimportcharactertrigramfuzzysetasctfsitems=[line.rstrip()forlineinopen(words_path,'r')]fs=ctfs.CharacterTrigramFuzzySet(items)fs.get('bryan')# Profiling, generally around 10-20 ms per call on my machinetimeit("fs.get('bryan')",setup=''' import charactertrigramfuzzyset as ctfs items = [line.rstrip() for line in open('{words_path}', 'r')] fs = ctfs.CharacterTrigramFuzzySet(items) '''.format(words_path=words_path),number=1000)