Python underthesea包_程序模块 - PyPI

越南NLP工具包

underthesea的Python项目详细描述

在Thesea-越南NLP工具包下

https://img.shields.io/pypi/v/underthesea.svg

https://img.shields.io/pypi/pyversions/underthesea.svg

https://img.shields.io/badge/license-GNU%20General%20Public%20License%20v3-brightgreen.svg

https://img.shields.io/travis/undertheseanlp/underthesea.svg

https://img.shields.io/badge/chat-on%20facebook-green.svg

under thesea是一套开源python模块、数据集和教程，支持越南自然语言处理的研究和开发。

自由软件：GNU通用公共许可v3
文档：https://underthesea.readthedocs.io
现场演示：undertheseanlp.com
Facebook页面：https://www.facebook.com/undertheseanlp/
YouTube:Underthesea NLP Channel

安装

要在Thesea下安装，只需：

$ pip install underthesea
✨?✨

满意，保证。

1.句子分段

https://img.shields.io/badge/F1-98%25-red.svg

https://img.shields.io/badge/✎-custom%20models-blue.svg

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportsent_tokenize>>>text='Taylor cho biết lúc đầu cô cảm thấy ngại với cô bạn thân Amanda nhưng rồi mọi thứ trôi qua nhanh chóng. Amanda cũng thoải mái với mối quan hệ này.'>>>sent_tokenize(text)["Taylor cho biết lúc đầu cô cảm thấy ngại với cô bạn thân Amanda nhưng rồi mọi thứ trôi qua nhanh chóng.","Amanda cũng thoải mái với mối quan hệ này."]

2.分词

https://img.shields.io/badge/F1-94%25-red.svg

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportword_tokenize>>>sentence='Chàng trai 9X Quảng Trị khởi nghiệp từ nấm sò'>>>word_tokenize(sentence)['Chàng trai','9X','Quảng Trị','khởi nghiệp','từ','nấm','sò']>>>word_tokenize(sentence,format="text")'Chàng_trai 9X Quảng_Trị khởi_nghiệp từ nấm sò'

3.位置标记

https://img.shields.io/badge/accuracy-92.3%25-red.svg

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportpos_tag>>>pos_tag('Chợ thịt chó nổi tiếng ở Sài Gòn bị truy quét')[('Chợ','N'),('thịt','N'),('chó','N'),('nổi tiếng','A'),('ở','E'),('Sài Gòn','Np'),('bị','V'),('truy quét','V')]

4.分块

https://img.shields.io/badge/F1-77%25-red.svg

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportchunk>>>text='Bác sĩ bây giờ có thể thản nhiên báo tin bệnh nhân bị ung thư?'>>>chunk(text)[('Bác sĩ','N','B-NP'),('bây giờ','P','I-NP'),('có thể','R','B-VP'),('thản nhiên','V','I-VP'),('báo tin','N','B-NP'),('bệnh nhân','N','I-NP'),('bị','V','B-VP'),('ung thư','N','I-VP'),('?','CH','O')]

5。命名实体识别

https://img.shields.io/badge/F1-86.6%25-red.svg

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportner>>>text='Chưa tiết lộ lịch trình tới Việt Nam của Tổng thống Mỹ Donald Trump'>>>ner(text)[('Chưa','R','O','O'),('tiết lộ','V','B-VP','O'),('lịch trình','V','B-VP','O'),('tới','E','B-PP','O'),('Việt Nam','Np','B-NP','B-LOC'),('của','E','B-PP','O'),('Tổng thống','N','B-NP','O'),('Mỹ','Np','B-NP','B-LOC'),('Donald','Np','B-NP','B-PER'),('Trump','Np','B-NP','I-PER')]

6.文本分类

https://img.shields.io/badge/accuracy-86.7%25-red.svg

安装依赖项并下载默认模型

$ pip install git+https://github.com/facebookresearch/fastText.git@v0.2.0
$ pip install unidecode
$ underthesea download tc_general
$ underthesea download tc_bank

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportclassify>>>classify('HLV đầu tiên ở Premier League bị sa thải sau 4 vòng đấu')['The thao']>>>classify('Hội đồng tư vấn kinh doanh Asean vinh danh giải thưởng quốc tế')['Kinh doanh']>>classify('Lãi suất từ BIDV rất ưu đãi',domain='bank')['INTEREST_RATE']

7.情绪分析

https://img.shields.io/badge/F1-59.5%25-red.svg

安装依赖项

$ pip install git+https://github.com/facebookresearch/fastText.git@v0.2.0
$ pip install unidecode
$ underthesea download sa_general
$ underthesea download sa_bank

用法

>>># -*- coding: utf-8 -*->>>fromundertheseaimportsentiment>>>sentiment('hàng kém chất lg,chăn đắp lên dính lông lá khắp người. thất vọng')negative>>>sentiment('Sản phẩm hơi nhỏ so với tưởng tượng nhưng chất lượng tốt, đóng gói cẩn thận.')positive>>>sentiment('Đky qua đường link ở bài viết này từ thứ 6 mà giờ chưa thấy ai lhe hết',domain='bank')['CUSTOMER_SUPPORT#negative']>>>sentiment('Xem lại vẫn thấy xúc động và tự hào về BIDV của mình',domain='bank')['TRADEMARK#positive']

即将推出的功能

文本到语音
自动语音识别
机器翻译
依赖项分析

贡献

你想为本项目的发展做出贡献吗？伟大的！请阅读CONTRIBUTING.rst.上的更多详细信息

历史记录

1.1.16（2019-06-15）

语言流依赖关系的升级版本（GH-231）
更新phi_n b_n scikit learn 0.20.2（gh-229）
C_P NH_T L_I C嫒C依赖关系（GH-241）
c_p nh_t m_h_nh tr_n b_d_li_u vntc（gh-246）
C_P NH_T M_NH TR_N B_D_LI_U UTS2017 U Bank U TC（GH-243）
C_P NH_T M_NH TR_N B_D_LI_U UTS2017 U Bank U SA（GH-244）
L_i v_i c嫒c嫒u情感_demo（GH-236）
Th_ng NH_t C_ch_t_n v_n L_模型（GH-225）

1.1.12（2019-03-13）

添加句子分段功能

1.1.9（2019-01-01）

提高word_tokenize函数的速度
仅支持Python3.6+
使用flake8执行样式指南

1.1.8（2018-06-20）

修复文本包含制表符（T）字符时的word_tokenize错误
用url修复regex_tokenize

1.1.7（2018-04-12）

将word_sent函数重命名为word_tokenize
重构setup.py文件和初始化文件中的版本控制
更新文档徽章url

1.1.6（2017-12-26）

新特性：方面情绪分析
与LanguageFlow 1.1.6集成
用“=”（159）修复标记字符串的错误

1.1.5（2017-10-12）

新功能：命名实体识别
重构和更新word_sent、pos_tag、chunking的模型

1.1.4（2017-09-12）

新功能：文本分类
[错误]修复文本错误
[文档]添加Facebook链接

1.1.3（2017-08-30）

添加现场演示：https://underthesea.herokuapp.com/

1.1.2（2017-08-22）

添加词典

1.1.1（2017-07-05）

支持Python3
重构功能工程代码

1.1.0（2017-05-30）

添加分块功能
添加pos_标记功能
添加word_sent功能，修复性能
添加语料库类
添加变压器类
与Ho Ngoc Duc词典集成
添加travis ci，使用pypi自动生成

1.0.0（2017-03-01）

pypi上的第一个版本。
在ReadtheDocs上首次发布

欢迎加入QQ群-->： 979659372

underthesea 1.1.17

underthesea的Python项目详细描述

在Thesea-越南NLP工具包下

安装

使用量

1.句子分段

2.分词

3.位置标记

4.分块

5。命名实体识别

6.文本分类

7.情绪分析

即将推出的功能

贡献

历史记录

1.1.16（2019-06-15）

1.1.12（2019-03-13）

1.1.9（2019-01-01）

1.1.8（2018-06-20）

1.1.7（2018-04-12）

1.1.6（2017-12-26）

1.1.5（2017-10-12）

1.1.4（2017-09-12）

1.1.3（2017-08-30）

1.1.2（2017-08-22）

1.1.1（2017-07-05）

1.1.0（2017-05-30）

1.0.0（2017-03-01）

推荐PyPI第三方库

mvtsdatatoolkit

nipunn-topsis

KubunConfector

cpmd-cube-tools

pyfolio-v0p5p3

plotlygeo

ivp

answerbook-webhook-test

aiotcloud

pg-export

cellwrapper

pytestecho

pything

NiLBS

fbctl

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签