Python test-mark包_程序模块 - PyPI

数学运算的一个简单函数

test-mark的Python项目详细描述

孟加拉文字提取器（BFE）

BFE是一个基于孟加拉语自然语言处理的特征抽取器。在

当前特性

CountVectorizer
TfIdf
Word Embedding
- Word2Vec
- FastText
在

安装

pip install bfe

示例

1。计数矢量器

拟合n变换
转换
获取单词集

Fit n转换

^{pr2}$

Transform

frombfeimportCountVectorizerct=CountVectorizer()get_mat=ct.transform("রাহাত")#Output: the countVectorized matrix form of given word

Get Wordset

frombfeimportCountVectorizerct=CountVectorizer()ct.get_wordSet()#Output: get the raw wordset used in training model

2。TfIdf

拟合n变换
转换
系数

Fit n转换

frombfeimportTfIdfVectorizerk=TfIdfVectorizer()doc=["কাওছার আহমেদ","শুভ হাইদার"]matrix1=k.fit_transform(doc)print(matrix1)'''Output: [[0.150515 0.150515 0.       0.      ] [0.       0.       0.150515 0.150515]]'''

Transform

frombfeimportTfIdfVectorizerk=TfIdfVectorizer()doc=["আহমেদ সুমন","কাওছার করিম"]matrix2=k.transform(doc)print(matrix2)'''Output: [[0.150515 0.       0.       0.      ] [0.       0.150515 0.       0.      ]]'''

系数

frombfeimportTfIdfVectorizerk=TfIdfVectorizer()doc=["কাওছার আহমেদ","শুভ হাইদার"]k.fit_transform(doc)wordset,idf=k.coefficients()print(wordset)#Output: ['আহমেদ', 'কাওছার', 'হাইদার', 'শুভ']print(idf)'''Output: {'আহমেদ': 0.3010299956639812, 'কাওছার': 0.3010299956639812, 'হাইদার': 0.3010299956639812, 'শুভ': 0.3010299956639812}'''

3。单词嵌入

在
Word2Vec
- 培训
- 获取词向量
- 获取相似性
- 得到n个相似的单词
- 获取中间词
- 得到奇怪的词
- 求相似图
在

Training

frombfeimportBN_Word2Vec#Training Against Sentencesw2v=BN_Word2Vec(sentences=[['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা']])w2v.train_Word2Vec()#Training Against one Datasetw2v=BN_Word2Vec(corpus_file="path to data or txt file")w2v.train_Word2Vec()#Training Against Multiple Dataset'''    path      ->data        ->1.txt        ->2.txt        ->3.txt'''w2v=BN_Word2Vec(corpus_path="path/data")w2v.train_Word2Vec(epochs=25)

训练完成后，模型“w2v_模型”及其支持向量文件将被保存到当前目录。在

如果使用任何预先训练的模型，请在初始化BN\u Word2Vec（）时指定它。否则不需要型号名称。

Get Word Vector

frombfeimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_wordVector('আমার')

获取相似性

frombfeimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_similarity('ঢাকা','রাজধানী')#Output: 67.457879

Get n个相似单词

frombfeimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_n_similarWord(['পদ্মা'],n=10)#Output: '''[('সেতুর', 0.5857524275779724), ('মুলফৎগঞ্জ', 0.5773632526397705), ('মহানন্দা', 0.5634652376174927), ("'পদ্মা", 0.5617109537124634), ('গোমতী', 0.5605217218399048), ('পদ্মার', 0.5547558069229126), ('তুলসীগঙ্গা', 0.5274507999420166), ('নদীর', 0.5232067704200745), ('সেতু', 0.5225246548652649), ('সেতুতে', 0.5192927718162537)]'''

Get中间词

Get the probability distribution of the center word given words list.

frombfeimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_outputWord(['ঢাকায়','মৃত্যু'],n=2)#Output:  [("হয়েছে।',", 0.05880642), ('শ্রমিকের', 0.05639163)]

Get奇数单词

Get the most unmatched word out from given words list

frombfeimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_oddWords(['চাল','ডাল','চিনি','আকাশ'])#Output: 'আকাশ'

获取相似性图

Creates a barplot of similar words with their probability

frombfeimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_oddWords(['চাল','ডাল','চিনি','আকাশ'])

在
快速文本
- 培训
- 获取词向量
- 获取相似性
- 得到n个相似的单词
- 获取中间词
- 得到奇怪的词
在

Training

frombfeimportBN_FastText#Training Against Sentencesft=FastText(sentences=[['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা']])ft.train_fasttext()#Training Against one Datasetft=FastText(corpus_file="path to data or txt file")ft.train_fasttext()#Training Against Multiple Dataset'''    path      ->data        ->1.txt        ->2.txt        ->3.txt'''ft=FastText(corpus_path="path/data")ft.train_fasttext(epochs=25)

训练完成后，模型“ft_model”及其支持向量文件将被保存到当前目录。在

如果使用任何预先训练的模型，请在初始化BN\u FastText（）时指定它。否则不需要型号名称。

Get Word Vector

frombfeimportBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_wordVector('আমার')

获取相似性

frombfeimportBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_similarity('ঢাকা','রাজধানী')#Output: 70.56821120

Get n个相似单词

^{pr21}$

Get奇数单词

Get the most unmatched word out from given words list

from"package_name"importBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_oddWords(['চাল','ডাল','চিনি','আকাশ'])#Output: 'আকাশ'

获取相似性图

Creates a barplot of similar words with their probability

frombfeimportBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_oddWords(['চাল','ডাল','চিনি','আকাশ'])

欢迎加入QQ群-->： 979659372

test-mark 0.1

test-mark的Python项目详细描述

孟加拉文字提取器（BFE）

当前特性

安装

示例

1。计数矢量器

2。TfIdf

3。单词嵌入

Word2Vec

快速文本

推荐PyPI第三方库

sos-scala

binaryrpc

wxpython-piano-roll

colorlover

mpl-finance

staller

socialite-facebook

ESN

cdfcdf

scaii

monkey-puzzle

stegoprng

qbatch

shove-leveldb

tellurium

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

test-mark 0.1

test-mark的Python项目详细描述

孟加拉文字提取器（BFE）

当前特性

安装

示例

1。计数矢量器

2。TfIdf

3。单词嵌入

Word2Vec

快速文本

推荐PyPI第三方库

sos-scala

binaryrpc

wxpython-piano-roll

colorlover

mpl-finance

staller

socialite-facebook

ESN

cdfcdf

scaii

monkey-puzzle

stegoprng

qbatch

shove-leveldb

tellurium

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签