真damerau-levenshtein算法的cython实现。

fastDamerauLevenshtein的Python项目详细描述


FastDamerauleVenshtein

Build StatusWheel Status

cython实现了真正的damerau levenshtein编辑距离,允许一个项目被多次编辑。 更多信息来自Wikipedia

In information theory and computer science, the Damerau-Levenshtein distance (named after Frederick J. Damerau and Vladimir I. Levenshtein) is a string metric for measuring the edit distance between two sequences. Informally, the Damerau-Levenshtein distance between two words is the minimum number of operations (consisting of insertions, deletions or substitutions of a single character, or transposition of two adjacent characters) required to change one word into the other.
The Damerau-Levenshtein distance differs from the classical Levenshtein distance by including transpositions among its allowable operations in addition to the three classical single-character edit operations (insertions, deletions and substitutions).

该实现基于James M. Jensen II解释,它允许指定每个操作的成本。

要求

这段代码需要Python2.7或3.4+和一个C编译器,比如GCC。

安装

fastdameraulevenshtein可在pypi上的https://pypi.python.org/pypi/fastDamerauLevenshtein找到。

使用pip

安装
pip install fastDamerauLevenshtein

从源安装:

python setup.py install

pip install .

用法

它被称为damerauLevenshtein的可用方法,可以计算两个可散列对象(字符串、字符串列表等)上的距离。该方法提供以下参数:

  • firstobject

  • secondobject

  • 相似性

    • 如果这个参数值是False,它将返回编辑的总成本,否则它将返回一个从0.0到1.0的分数,表示两个对象有多相似。默认为True
  • deleteWeight

    • 删除操作的成本。
  • insertweight

    • 插入操作的成本。
  • replaceWeight

    • 更换操作的成本。
  • swapweight

    • 交换操作的成本。

提供的操作权重必须是int值。默认情况下,所有这些值都是1

基本用途:

fromfastDamerauLevenshteinimportdamerauLevenshteindamerauLevenshtein('ca','abc',similarity=False)# expected result: 2.0damerauLevenshtein('car','cars',similarity=True)# expected result: 0.75damerauLevenshtein(['ab','bc'],['ab'],similarity=False)# expected result: 1.0damerauLevenshtein(['ab','bc'],['ab'],similarity=True)# expected result: 0.5

基准

其他python damerau levenshtein和osa实现:

Python 3.7(在Intel i5 6500上):

>>> import timeit
>>> #fastDamerauLevenshtein:
... timeit.timeit(setup="import fastDamerauLevenshtein; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="fastDamerauLevenshtein.damerauLevenshtein(text1, text2)", number=100000)
0.43
>>> #pyxDamerauLevenshtein:
... timeit.timeit(setup="from pyxdameraulevenshtein import normalized_damerau_levenshtein_distance; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="normalized_damerau_levenshtein_distance(text1, text2)", number=100000)
2.44
>>> #jellyfish
... timeit.timeit(setup="import jellyfish; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="jellyfish.damerau_levenshtein_distance(text1, text2)", number=100000)
0.20
>>> #editdistance
... timeit.timeit(setup="import editdistance; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="editdistance.eval(text1, text2)", number=100000)
0.22
>>> #textdistance
... timeit.timeit(setup="import textdistance; text1='afwafghfdowbihgp'; text2='goagumkphfwifawpte'", stmt="textdistance.damerau_levenshtein.distance(text1, text2)", number=100000)
0.70

许可证

它是根据麻省理工学院的许可证发行的。

Copyright (c) 2019 Robert Grigoroiu

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何强制用户在允许访问活动之前处理对话框?我的许可证代码怎么了?   java ArraysList作为JSON   mysql如何在java中创建包含多个可选where子句的搜索语句?   java如何让Apache Camel在“直接”路径的末尾删除文件?   使用socket在两个Androids之间进行java实时数据传输。IO(websocket)和4G   如何在java中实现两个CORBA服务器之间的通信   会话树xml表示为java对象   java Skype4Java编号swtwin323325   java RecyclerView getAdapterPosition()不工作:第一次单击返回正确位置,第二次单击返回1   java在$TOMCAT/conf/context上为JNDI设置资源。xml   java为什么第二个矩形冲突在第一个矩形冲突时不起作用?   JScrollPane上的java JTextArea未出现在JPanel上   java如何将实现的PriorityQueue打印为字符串?   jpa使用Jersey更新用户角色RESTJava(JAXRS)