Python pywordseg包_程序模块 - PyPI

开源的最新中文分词工具包

pywordseg的Python项目详细描述

pywordseg

比尔斯特和埃尔莫最新技术基于bilstm和elmo的开源中文分词系统。

arxiv纸质链接：https://arxiv.org/abs/1901.05816
PYPI页面：https://pypi.org/project/pywordseg/

性能

“字符级elmo”模型“基线”模型“字符级elmo”模型“字符级elmo”模型。基于）和CKIP（基于规则）的
该回购协议在图中提供了“字符级elmo”模型和“基线”模型。我们的“字符级elmo”模型比以前最先进的中文分词法（ma等人。2018），也主要是outerform“Jieba”和“CKIP”，这是处理简体/繁体中文文本时最流行的工具包。

“字符级elmo”模型，“基线”模型
在考虑oov准确性时，我们的“字符级elmo”模型的性能比我们的“基线”模型高5%。

用法

要求

python>；=3.6（不要使用3.5）
喷灯0.4
覆盖

使用pip

安装

$ pip install pywordseg
当您第一次导入时，模块将在1分钟内自动下载模型。
如果您使用macos并在下载模型时遇到urllib.error.URLError问题，
尝试$ sudo /Applications/Python\ 3.6/Install\ Certificates.command绕过证书颁发。

手动安装

$ git clone https://github.com/voidism/pywordseg
下载ELMoForManyLangs.zip并将其解压缩到pywordseg/pywordseg（elmo模型的代码来自HIT-SCIR，在字符级别由我自己训练）
$ pip install .在主目录下

分段！

# import the modulefrompywordsegimport*# declare the segmentor.seg=Wordseg(batch_size=64,device="cuda:0",embedding='elmo',elmo_use_cuda=True,mode="TW")# input is a list of raw sentences.seg.cut(["今天天氣真好啊!","潮水退了就知道，誰沒穿褲子。"])# will return a list of lists of the segmented sentences.# [['今天', '天氣', '真', '好', '啊', '!'], ['潮水', '退', '了', '就', '知道', ',', '誰', '沒', '穿', '褲子', '。']]

参数：

批大小：分词模型的批大小，默认值：64。
device：运行模型的CPU/GPU设备，默认值：'cpu'。
嵌入：（默认值：'w2v'）
- 'elmo'：加载的模型将是上面的“字符级elmo”模型，运行缓慢。
- 'w2v'：加载的模型将是上面的“基线模型”，运行速度比'elmo'快。
elmo_use_cuda：如果要在gpu上加速elmo模型，请使用True，否则elmo模型将在cpu上运行。当embedding='w2v'时，此参数不起作用。默认值：True。
mode：WordSeg将根据下面列出的模式加载不同的模型：（默认值：TW）
- TW：台湾中央研究院中国知识产权研究所语料库培训。
- {CD18> }：来自香港城市香港大学的CITYU语料库训练。
- CN_MSR：微软研究中心msr语料库培训，中国。
- CN_PKU或CN：来自北京大学的北大语料库培训。

待办事项

中国的“中国模式”，中国的“中国模式”，中国的“中国模式”，中国的“中国模式”。中新网

欢迎加入QQ群-->： 979659372

pywordseg 0.1.1

pywordseg的Python项目详细描述

pywordseg

性能

用法

要求

使用pip

手动安装

分段！

参数：

待办事项

推荐PyPI第三方库

odoo9-addon-pos-barcode-tare

ishutin-otus-homework1-cli-searcher

teheran-test

anyoneai

hnlp

awsglue-local

clustering-jhk

neatutils

deltat

pyatmos

dblue-stats

azure-mgmt-avs

robocode

kb4api

downwards

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

pywordseg 0.1.1

pywordseg的Python项目详细描述

pywordseg

性能

用法

要求

使用pip

手动安装

分段！

参数：

待办事项

推荐PyPI第三方库

odoo9-addon-pos-barcode-tare

ishutin-otus-homework1-cli-searcher

teheran-test

anyoneai

hnlp

awsglue-local

clustering-jhk

neatutils

deltat

pyatmos

dblue-stats

azure-mgmt-avs

robocode

kb4api

downwards

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签