Python ekushe包_程序模块 - PyPI

未提供项目说明

ekushe的Python项目详细描述

伊库什

“Ekushey”是首个结构化且经济高效的孟加拉语自然语言处理工具包

电流模块

Feature Extraction

feature_extraction is a Bangla Natural Language Processing based feature extractor

特征提取

CountVectorizer
HashVectorizer
TfIdf
Word Embedding
在

安装

^{pr2}$

示例

1。计数矢量器

拟合n变换
转换
获取单词集

Fit n转换

fromekushey.feature_extractionimportCountVectorizerct=CountVectorizer()X=ct.fit_transform(X)# X is the word features

输出：

the countVectorized matrix form of given features

Transform

fromekushey.feature_extractionimportCountVectorizerct=CountVectorizer()get_mat=ct.transform("রাহাত")

输出：

the countVectorized matrix form of given word

Get Wordset

fromekushey.feature_extractionimportCountVectorizerct=CountVectorizer()ct.get_wordSet()

输出：

get the raw wordset used in training model

2。HashVectorizer

拟合n变换
转换

fromekushey.feature_extractionimportHashVectorizercorpus=['আমাদের দেশ বাংলাদেশ','আমার বাংলা']Vectorizer=HashVectorizer()n_features=8X=Vectorizer.fit_transform(corpus,n_features)corpus_t=["আমাদের দেশ অনেক সুন্দর"]Xf=Vectorizer.transform(corpus_t)print(X.shape,Xf.shape)print("=====================================")print(X)print("=====================================")print(Xf)

输出：

(2, 8) (1, 8)
=====================================
  (0, 7)	-1.0
  (1, 7)	-1.0
=====================================
  (0, 0)	0.5773502691896258
  (0, 2)	0.5773502691896258
  (0, 7)	-0.5773502691896258

Get Wordset

3。TfIdf

拟合n变换
转换
系数

Fit n转换

fromekushey.feature_extractionimportTfIdfVectorizerk=TfIdfVectorizer()doc=["কাওছার আহমেদ","শুভ হাইদার"]matrix1=k.fit_transform(doc)print(matrix1)

输出：

[[0.150515 0.150515 0.       0.      ]
 [0.       0.       0.150515 0.150515]]

Transform

fromekushey.feature_extractionimportTfIdfVectorizerk=TfIdfVectorizer()doc=["আহমেদ সুমন","কাওছার করিম"]matrix2=k.transform(doc)print(matrix2)

输出：

[[0.150515 0.       0.       0.      ]
 [0.       0.150515 0.       0.      ]]

系数

fromekushey.feature_extractionimportTfIdfVectorizerk=TfIdfVectorizer()doc=["কাওছার আহমেদ","শুভ হাইদার"]k.fit_transform(doc)wordset,idf=k.coefficients()print(wordset)#Output: ['আহমেদ', 'কাওছার', 'হাইদার', 'শুভ']print(idf)'''Output: {'আহমেদ': 0.3010299956639812, 'কাওছার': 0.3010299956639812, 'হাইদার': 0.3010299956639812, 'শুভ': 0.3010299956639812}'''

4。单词嵌入

在
Word2Vec
- 培训
- 获取词向量
- 获取相似性
- 得到n个相似的单词
- 获取中间词
- 得到奇怪的词
- 求相似图
在

Training

fromekushey.feature_extractionimportBN_Word2Vec#Training Against Sentencesw2v=BN_Word2Vec(sentences=[['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা'],['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা'],['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা']])w2v.train()#Training Against one Text Corpusw2v=BN_Word2Vec(corpus_file="path_to_corpus.txt")w2v.train()#Training Against Multiple corpuses'''    path      ->corpus        ->1.txt        ->2.txt        ->3.txt'''w2v=BN_Word2Vec(corpus_path="path/corpus")w2v.train(epochs=25)#Training Against a Dataframe Columnw2v=BN_Word2Vec(df=news_data['text_content'])w2v.train(epochs=25)

训练完成后，模型“w2v_模型”及其支持向量文件将被保存到当前目录。在

如果使用任何预先训练的模型，请在初始化BN\u Word2Vec（）时指定它。否则不需要型号名称。

Get Word Vector

fromekushey.feature_extractionimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_wordVector('আমার')

获取相似性

fromekushey.feature_extractionimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_similarity('ঢাকা','রাজধানী')

输出：

67.457879

Get n个相似单词

fromekushey.feature_extractionimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_n_similarWord(['পদ্মা'],n=10)

输出：

^{pr21}$

Get中间词

得到中心词给定词表的概率分布。在

fromekushey.feature_extractionimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_outputWord(['ঢাকায়','মৃত্যু'],n=2)

输出：

[("হয়েছে।',", 0.05880642), ('শ্রমিকের', 0.05639163)]

Get奇数单词

从给定单词列表中找出最不匹配的单词

fromekushey.feature_extractionimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_oddWords(['চাল','ডাল','চিনি','আকাশ'])

输出：

'আকাশ'

获取相似性图

创建具有概率的相似单词的条形图

fromekushey.feature_extractionimportBN_Word2Vecw2v=BN_Word2Vec(model_name='give the model name here')w2v.get_similarity_plot('চাউল',5)

在
快速文本
- 培训
- 获取词向量
- 获取相似性
- 得到n个相似的单词
- 获取中间词
- 得到奇怪的词
在

Training

fromekushey.feature_extractionimportBN_FastText#Training Against Sentencesft=ft=BN_FastText(sentences=[['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা'],['বাংলা','আমার','মাতৃভাষা'],['বাংলা','আমার','মাতৃভাষা'],['বাংলা','আমার','মাতৃভাষা']])ft.train()#Training Against one Text Corpusft=BN_FastText(corpus_file="path to data or txt file")ft.train()#Training Against Multiple Corpuses'''    path      ->Corpus        ->1.txt        ->2.txt        ->3.txt'''ft=BN_FastText(corpus_path="path/Corpus")ft.train(epochs=25)#Training Against a Dataframe Columnft=BN_FastText(df=news_data['text_content'])ft.train(epochs=25)

训练完成后，模型“ft_model”及其支持向量文件将被保存到当前目录。在

如果不想训练而是使用预训练的模型，请在初始化BN\u FastText（）时指定它。否则不需要型号名称。

Get Word Vector

fromekushey.feature_extractionimportBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_wordVector('আমার')

获取相似性

fromekushey.feature_extractionimportBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_similarity('ঢাকা','রাজধানী')

输出：

70.56821120

Get n个相似单词

^{pr31}$

输出：

[('পদ্মায়', 0.8103810548782349),
 ('পদ্মার', 0.794012725353241),
 ('পদ্মানদীর', 0.7747839689254761),
 ('পদ্মা-মেঘনার', 0.7573559284210205),
 ('পদ্মা.', 0.7470568418502808),
 ('‘পদ্মা', 0.7413997650146484),
 ('পদ্মাসেতুর', 0.716225266456604),
 ('পদ্ম', 0.7154797315597534),
 ('পদ্মহেম', 0.6881639361381531),
 ('পদ্মাবত', 0.6682782173156738)]

Get奇数单词

从给定单词列表中找出最不匹配的单词

from"package_name"importBN_FastTextft=BN_FastText(model_name='give the model name here')ft.get_oddWords(['চাল','ডাল','চিনি','আকাশ'])

输出：

'আকাশ'

获取相似性图

创建具有概率的相似单词的条形图

^{pr35}$

在
手套
- 培训
- 得到n个相似的单词
在

Training

fromekushey.feature_extractionimportBN_GloVe#Training Against Sentencesglv=BN_GloVe(sentences=[['আমার','প্রিয়','জন্মভূমি'],['বাংলা','আমার','মাতৃভাষা'],['বাংলা','আমার','মাতৃভাষা'],['বাংলা','আমার','মাতৃভাষা'],['বাংলা','আমার','মাতৃভাষা']])glv.train()#Training Against one Text Corpusglv=BN_GloVe(corpus_file="path_to_corpus.txt")glv.train()#Training Against Multiple Corpuses'''    path      ->Corpus        ->1.txt        ->2.txt        ->3.txt'''glv=BN_GloVe(corpus_path="path/corpus")glv.train(epochs=25)#Training Against a Dataframe Columnglv=BN_GloVe(df=news_data['text_content'])glv.train(epochs=25)

训练完成后，模型“手套模型”及其支持向量文件将被保存到当前目录。在

如果不想训练而是使用预训练的模型，请在初始化BN\u FastText（）时指定它。否则不需要型号名称。

Get n个相似单词

fromekushey.feature_extraction" import BN_GloVe glv=BN_GloVe(model_name='give the model name here')glv.get_n_similarWord(['পদ্মা'],n=10)

输出：

[('পদ্মায়', 0.8103810548782349),
 ('পদ্মার', 0.794012725353241),
 ('পদ্মানদীর', 0.7747839689254761),
 ('পদ্মা-মেঘনার', 0.7573559284210205),
 ('পদ্মা.', 0.7470568418502808),
 ('‘পদ্মা', 0.7413997650146484),
 ('পদ্মাসেতুর', 0.716225266456604),
 ('পদ্ম', 0.7154797315597534),
 ('পদ্মহেম', 0.6881639361381531),
 ('পদ্মাবত', 0.6682782173156738)]

欢迎加入QQ群-->： 979659372

ekushey 0.6

ekushe的Python项目详细描述

伊库什

电流模块

特征提取

安装

示例

1。计数矢量器

2。HashVectorizer

3。TfIdf

4。单词嵌入

Word2Vec

快速文本

手套

推荐PyPI第三方库

odoo13-addon-base-cron-exclusion

pylone

tapcode

namekox-amqp

salesvision

solomon

odoo13-addon-fieldservice-skill

edenpdf

orbis-new

dbt-sqlite

odoo13-addon-sale-product-pack

xmtraining

altdeutsch

scipion-em-fsc3d

itd

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

ekushey 0.6

ekushe的Python项目详细描述

伊库什

电流模块

特征提取

安装

示例

1。计数矢量器

2。HashVectorizer

3。TfIdf

4。单词嵌入

Word2Vec

快速文本

手套

推荐PyPI第三方库

odoo13-addon-base-cron-exclusion

pylone

tapcode

namekox-amqp

salesvision

solomon

odoo13-addon-fieldservice-skill

edenpdf

orbis-new

dbt-sqlite

odoo13-addon-sale-product-pack

xmtraining

altdeutsch

scipion-em-fsc3d

itd

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签