Python theano-word2vec包_程序模块 - PyPI

Word2vec使用意大利面和意大利面

theano-word2vec的Python项目详细描述

#无字2vec mikolov的word2vec在python 2中的一个实现，使用了theano和千层面。

##关于这个包裹这个包的编写考虑了组件的模块性，希望它们在创建标准的变体时可以重用文字2vec.很快，我将提供完整的文档和指导自定义和扩展，以及如何设置包的教程。现在，请欣赏本快速入门指南

##快速启动 注意：这个包现在只适用于python 2。

###安装从python包索引安装： `bash pip install theano-word2vec `

或者，安装一个可以破解的版本： `bash git clone https://github.com/enewe101/word2vec.git cd word2vec python setup.py develop `

###使用

训练word2vec嵌入的最简单方法是： `python >>> from word2vec import word2vec >>> embedder, dictionary = word2vec(files=['corpus/file1.txt','corpus/file2.txt']) ` 其中输入文件的格式应为每行一个句子，其中标记空间分隔。

经过训练后，嵌入器可用于将单词转换为向量： `python >>> tokens = 'A sentence to embed'.split() >>> token_ids = dictionary.get_ids(tokens) >>> vectors = word2vec_embedder.embed(token_ids) `

word2vec（）函数公开出现的大多数基本参数在基于噪声对比估计的Mikolov跳跃图模型中： `python >>> embedder, dictionary = word2vec( ... # directory in which to save embedding parameters (deepest dir created if doesn't exist) ... savedir='data/my-embedding', ... ... # List of files comprising the corpus ... files=['corpus/file1.txt','corpus/file2.txt'], ... ... # Include whole directories of files (deep files not included) ... directories=['corpus','corpus/additional'], ... ... # Indicate files to exclude using regexes ... skip=[re.compile('*.bk$'),re.compile('exclude-from-corpus')], ... ... # Number of passes through training corpus ... num_epochs=5, ... ... # Specify the mapping from tokens to ints (else create it automatically) ... unigram_dictionary=preexisting_dictionary, ... ... # Number of "noise" examples included for every "signal" example ... noise_ratio=15, ... ... # Relative probability of skip-gram sampling centered on query word ... kernel=[1,2,3,3,2,1], ... ... # Threshold used to calculate discard-probability for query words ... t=1.0e-5, ... ... # Size of minibatches during training ... batch_size = 1000, ... ... # Dimensionality of the embedding vector space ... num_embedding_dimensions = 500, ... ... # Initializer for embedding parameters (can be a numpy array too) ... word_embedding_init=lasagne.init.Normal(), ... ... # Initializer for context embedding parameters (can be numpy array) ... context_embedding_init=lasagne.init.Normal(), ... ... # Size of stochastic gradient descent steps during training ... learning_rate = 0.1, ... ... # Amount of Nesterov momentum during training ... momentum=0.9, ... ... # Print messages during training ... verbose=True, ... ... # Number of parrallel corpus-reading processes ... num_example_generators=3 ... ) `

有关更多自定义，请查看文档（稍后）以了解如何使用word2vec中提供的类组装您自己的培训设置。

欢迎加入QQ群-->： 979659372

theano-word2vec 0.2.2

theano-word2vec的Python项目详细描述

推荐PyPI第三方库

Tweet-Command-Line-Tool

django-codemirror2

DesktopStreamer

gu-django-tinymce

mailround

redis_triplestore

python-twitch-stream

ncbitax

pulumi-alicloud

django-geonames-place

wsgi-basic-auth

prosemirror-p

biflux

opentracing-instrumentation

pymarketstore2

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

theano-word2vec 0.2.2

theano-word2vec的Python项目详细描述

推荐PyPI第三方库

Tweet-Command-Line-Tool

django-codemirror2

DesktopStreamer

gu-django-tinymce

mailround

redis_triplestore

python-twitch-stream

ncbitax

pulumi-alicloud

django-geonames-place

wsgi-basic-auth

prosemirror-p

biflux

opentracing-instrumentation

pymarketstore2

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签