python中的模糊字符串匹配
fuzzywuzzymit的Python项目详细描述
fuzzywuzzymit
像老板一样的模糊字符串匹配。它使用Levenshtein Distance来计算简单易用包中序列之间的差异。
要求
- python 2.4或更高版本
- difflib
用于测试
- PycodeStyle
- 假设
- pytest
安装
通过pypi使用pip
pip install fuzzywuzzymit
通过github使用pip
pip install git+git://github.com/graingert/fuzzywuzzymit.git@0.16.0#egg=fuzzywuzzymit
添加到requirements.txt文件(之后运行pip install -r requirements.txt)
git+ssh://git@github.com/graingert/fuzzywuzzymit.git@0.16.0#egg=fuzzywuzzymit
通过git手动操作
git clone git://github.com/graingert/fuzzywuzzymit.git fuzzywuzzymit
cd fuzzywuzzymit
python setup.py install
用法
>>>fromfuzzywuzzymitimportfuzz>>>fromfuzzywuzzymitimportprocess
简单比率
>>>fuzz.ratio("this is a test","this is a test!")97
部分比率
>>>fuzz.partial_ratio("this is a test","this is a test!")100
令牌排序比率
>>>fuzz.ratio("fuzzy wuzzy was a bear","wuzzy fuzzy was a bear")91>>>fuzz.token_sort_ratio("fuzzy wuzzy was a bear","wuzzy fuzzy was a bear")100
令牌集比率
>>>fuzz.token_sort_ratio("fuzzy was a bear","fuzzy fuzzy was a bear")84>>>fuzz.token_set_ratio("fuzzy was a bear","fuzzy fuzzy was a bear")100
过程
>>>choices=["Atlanta Falcons","New York Jets","New York Giants","Dallas Cowboys"]>>>process.extract("new york jets",choices,limit=2)[('New York Jets',100),('New York Giants',78)]>>>process.extractOne("cowboys",choices)("Dallas Cowboys",90)
您还可以将其他参数传递给extractOne方法,使其使用特定的记分器。典型的用例是匹配文件路径:
>>>process.extractOne("System of a down - Hypnotize - Heroin",songs)('/music/library/good/System of a Down/2005 - Hypnotize/01 - Attack.mp3',86)>>>process.extractOne("System of a down - Hypnotize - Heroin",songs,scorer=fuzz.token_sort_ratio)("/music/library/good/System of a Down/2005 - Hypnotize/10 - She's Like Heroin.mp3",61)
已知端口
fuzzywuzzymit也被移植到其他语言中!以下是我们知道的几个端口:
- 爪哇语:xpresso’s fuzzywuzzymit implementation
- 爪哇语:fuzzywuzzymit (java port)
- 生锈:fuzzyrusty (Rust port)
- javascript:fuzzball.js (JavaScript port) < +:+:+