Python textaugment包_程序模块 - PyPI

为自然语言处理应用程序扩充文本的库。

textaugment的Python项目详细描述

TextAugment: Improving short text classification through global augmentation methods

textagment是一个python 3库，用于为自然语言处理应用程序扩充文本。textagment站在NLTK、Gensim和TextBlob的巨大肩膀上，和他们玩得很好。

引文

Improving short text classification through global augmentation methods发布到MLDM 2019

alt text

要求

Python3

以下软件包是依赖项，将自动安装。

$ pip install numpy nltk gensim textblob googletrans

以下代码下载wordnet的nltk语料库。

nltk.download('wordnet')

以下代码下载NLTK tokenizer。通过使用无监督算法为缩写词、搭配词和开始句子的词建立模型，该标记赋予器将文本划分为句子列表。

nltk.download('punkt')

下面的代码下载默认的NLTK part-of-speech tagger模型。词性标记器处理一系列单词，并将词性标记附加到每个单词。

nltk.download('averaged_perceptron_tagger')

使用gensim加载预先训练的word2vec模型。就像Google News from Google drive。

importgensimmodel=gensim.models.Word2Vec.load_word2vec_format('./GoogleNews-vectors-negative300.bin',binary=True)

或者使用您的数据或以下公共数据集从头开始训练一个人：

安装

从PIP安装[推荐]

$ pip install textaugment
or install latest release
$ pip install git+git@github.com:dsfsi/textaugment.git

从源安装

$ git clone git@github.com:dsfsi/textaugment.git
$ cd textaugment
$ python setup.py install

如何使用

有三种类型的增强可以使用：

word2vec

fromtextaugmentimportWord2vec

WordNet

fromtextaugmentimportWordnet

翻译（这需要互联网接入）

fromtextaugmentimportTranslate

基于word2vec的增强

基本示例

>>>fromtextaugmentimportWord2vec>>>t=Word2vec(model='path/to/gensim/model'or'gensim model itself')>>>t.augment('The stories are good')Thefilmsaregood

高级示例

>>>runs=1# By default.>>>v=False# verbose mode to replace all the words. If enabled runs is not effective. Used in this paper (https://www.cs.cmu.edu/~diyiy/docs/emnlp_wang_2015.pdf)>>>p=0.5# The probability of success of an individual trial. (0.1<p<1.0), default is 0.5. Used by Geometric distribution to selects words from a sentence.>>>t=Word2vec(model='path/to/gensim/model'or'gensim model itself',runs=5,v=False,p=0.5)>>>t.augment('The stories are good')Themoviesareexcellent

基于wordnet的扩充

基本示例

>>>importnltk>>>nltk.download('punkt')>>>nltk.download('wordnet')>>>fromtextaugmentimportWordnet>>>t=Wordnet()>>>t.augment('In the afternoon, John is going to town')Intheafternoon,Johniswalkingtotown

高级示例

>>>v=True# enable verbs augmentation. By default is True.>>>n=False# enable nouns augmentation. By default is False.>>>runs=1# number of times to augment a sentence. By default is 1.>>>p=0.5# The probability of success of an individual trial. (0.1<p<1.0), default is 0.5. Used by Geometric distribution to selects words from a sentence.>>>t=Wordnet(v=False,n=True,p=0.5)>>>t.augment('In the afternoon, John is going to town')Intheafternoon,Josephisgoingtotown.

基于rtt的增强

示例

>>>src="en"# source language of the sentence>>>to="fr"# target language>>>fromtextaugmentimportTranslate>>>t=Translate(src="en",to="fr")>>>t.augment('In the afternoon, John is going to town')IntheafternoonJohngoestotown

内置on

Python

作者

致谢

使用此库时请引用此paper。

许可证

麻省理工学院许可。有关详细信息，请参阅捆绑的LICENCE文件。

欢迎加入QQ群-->： 979659372

textaugment 1.1

textaugment的Python项目详细描述

TextAugment: Improving short text classification through global augmentation methods

引文

要求

安装

如何使用

基于word2vec的增强

基于rtt的增强

内置on

作者

致谢

许可证

推荐PyPI第三方库

onfair

tinysbus

danger-py-jscpd

nesterxkw

binogaus

ultrasnip

fairresearchlogin

pdf2emb-nlp

aiptrms

drfgenerators

canper-ssh-client

groupingsentences

drupal-download

jsonrpclibpelix

mayank-prob

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

textaugment 1.1

textaugment的Python项目详细描述

TextAugment: Improving short text classification through global augmentation methods

引文

要求

安装

如何使用

基于word2vec的增强

基于rtt的增强

内置on

作者

致谢

许可证

推荐PyPI第三方库

onfair

tinysbus

danger-py-jscpd

nesterxkw

binogaus

ultrasnip

fairresearchlogin

pdf2emb-nlp

aiptrms

drfgenerators

canper-ssh-client

groupingsentences

drupal-download

jsonrpclibpelix

mayank-prob

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签