用于同源词识别的拼写相似性度量。

spsim的Python项目详细描述


spsim是一个python 3模块,它实现了一个拼写相似性度量 用于识别跨语言的同源词,同时考虑拼写 如前所述,每一种语言对所特有的差异 在[Gomes2011]中。

注意:在下面的示例中,$表示bash提示符,并且假定是linux、macos或类似的*nix环境。

照常安装:

$ pip3 install spsim

命令行用法示例:

$ # first let's get some pairs of words that may be cognates:
$ wget http://research.variancia.com/spsim/maybe_enpt.txt
$ cat maybe_enpt.txt
pharmacy    farmácia
arithmetic  aritmética

$ # If we don't give any example cognates, SpSim will be equivalent to
$ #             1 - edit_distance / max_len_of_strings
$ # Note that by default spsim matches accentuated characters, i.e. a == á
$ echo "" > empty.txt
$ spsim empty.txt maybe_enpt.txt
pharmacy    farmácia    0.5
arithmetic  aritmética  0.8

$ now let's get some example cognates:
$ wget http://research.variancia.com/spsim/examples_enpt.txt
$ cat examples_enpt.txt
alcohol     álcool
alpha       alfa
anomaly     anomalia
mathematics matemática
methodology metodologia
metric      métrica
morphine    morfina
photos      fotos

$ # by giving these examples to spsim, it will learn to ignore certain differences:
$ spsim examples_enpt.txt maybe_enpt.txt
pharmacy    farmácia    1.0
arithmetic  aritmética  1.0
[Gomes2011]Measuring Spelling Similarity for Cognate Identification, Luís Gomes and Gabriel Pereira Lopes in Progress in Artificial Intelligence, 15th Portuguese Conference in Artificial Intelligence, EPIA 2011, Lisboa, Portugal, October 2011, http://www.springerlink.com/content/gtl56j3l06906020/

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
将PHP生成的公钥转换为Java公钥   java在服务中启动可取消的后台任务   java我能做这个通用的事情吗?   java Axis 1.4.1版在IBM上的部署为8.5版   java如何进行同步web服务调用   java从Spring控制器中的多个select获取值   java如何在JFrame中声明类(包含sapache Poi)的对象   jaxb反序列化XML以在Java中动态创建类   java如何为部署在Google云上的应用程序实现Google OAuth?   java我想了解libgdx的资产管理器   Java整数溢出   Windows任务管理器javaw。exe内存与Java任务控制   java如何让Android等待?