Python pystempel包_程序模块 - PyPI

波兰茎杆。

pystempel的Python项目详细描述

stempel的python端口，一种波兰语的算法词干分析器，最初是用java编写的。

原始的词干分析器是作为Egothor Project的一部分实现的，它实际上是不变的 Stempel Stemmer Java library由andrzej bia_ecki编写，next包含在Apache Lucene中，一个免费的开源搜索引擎库。

该套装还包括用于抛光的高质量填塞台和20000套培训设备，由Andrzej Bia_ecki预先培训。

端口不包括编译词干表的代码。

如何使用

在本地环境中安装：

pip install pystempel

在代码中使用：

>>>fromstempelimportStempelStemmer>>>stemmer=StempelStemmer.default()>>>forwordin['książki','książki','książkami','książkowa','książkowymi']:...print(stemmer.stem(word))...książekksiążekksiążekksiążkowyksiążkowy

在端口和包装器之间进行选择

如果在python中处理nlp项目，则可以在python端口和python包装器之间进行选择。 python端口是pystempel试图实现的：从java实现到python的转换。 python包装器是我在tests：python函数中用来调用干茎器。您可以在Stackoverflow comparision post中找到有关包装器和端口的更多信息。在这里，我比较两种方法以帮助您做出决定：

精度相同。我通过比较python端口的输出来验证它从自由波兰语词典输出331224个单词的原始java实现（sjp.pl）并且对于100%的单词，它返回相同的输出。
类似的性能。对于上述数据集，两个词干分析器版本都实现了可比的性能。 python端口在4.4秒内完成词干分析，而python包装器在5秒内完成（intel core i5-6000 3.30 GHz、16GB RAM、Windows 10、OpenJDK）
不同的设置。python包装器需要另外安装cython和pyjnius。 python包装器还将生成debugging harder（在两种编程语言之间切换）。

开发设置

要设置开发环境，需要安装Anaconda。

conda create -n stempel-stemmer
conda activate stempel-stemmer
conda install -c conda-forge --file requirements.txt

运行测试：

curl https://repo1.maven.org/maven2/org/apache/lucene/lucene-analyzers-stempel/8.1.1/lucene-analyzers-stempel-8.1.1.jar > stempel-8.1.1.jar
python -m pytest ./

运行基准：

python tests\test_benchmark.py

许可

大部分代码都包含在Egothor Open Source License（apache风格的许可证）中。其余的代码和预处理的词干表由Apache License 2.0覆盖。单元测试使用在sjp.pl的拼写检查中使用的免费波兰语词典，由Apache License 2.0覆盖也。

其他语言

Estem是stempel词干分析器的erlang包装器（不是端口）。

欢迎加入QQ群-->： 979659372

pystempel 1.0.1

pystempel的Python项目详细描述

如何使用

在端口和包装器之间进行选择

开发设置

许可

其他语言

推荐PyPI第三方库

odoo10-addon-stock-quant-reserved-qty-uom

dogslow

scanpydoc

ehour

bitlyshortener

yeecli

generalrepytivit

nvstrings-cuda92

talke

gpssim

dxlvtapiclient

guide-search

tedana

tracefront

Validata-A

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

pystempel 1.0.1

pystempel的Python项目详细描述

如何使用

在端口和包装器之间进行选择

开发设置

许可

其他语言

推荐PyPI第三方库

odoo10-addon-stock-quant-reserved-qty-uom

dogslow

scanpydoc

ehour

bitlyshortener

yeecli

generalrepytivit

nvstrings-cuda92

talke

gpssim

dxlvtapiclient

guide-search

tedana

tracefront

Validata-A

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签