Python Distiller包_程序模块 - PyPI

从文档集合中自动提取关键字

Distiller的Python项目详细描述

蒸馏器
==

格式。

要求
----

Distiller使用[自然语言工具包]（http://www.nltk.org/）

>您需要下载两个NLtk包：

>；>；导入NLtk
>；>；nltk.downloader（）
downloader>；d
下载哪个包（l=list；x=cancel）？
identifier>；maxent_treebank_pos_tagger
downloader>；d
下载哪个包（l=list；x=cancel）？
identifier>；stopwords

installation
----

-

>；distiller.distiller import distiller
>；Distiller=Distiller（数据、目标、选项）

参数
——

\data

json格式的文档集合所在文件的路径。

{
“元数据”：{
“基本URL”：“文档的源URL（如果有）

'文档'：[
{
'id'：'文档的唯一标识符（如果有的话）'，
'正文'：'单个文本块中的整个文档正文。'，
}，…
]
}

文档。

三元组：文档中单词三元组的列表及其被检测为密钥对的频率。

docmap：文档ID到其各自关键字n-grams的映射，以及其他统计信息。

keymap：关键字到它们出现在文档中的文档的映射。

#预处理期间的词干标记
“lemmatize”：false，
“tfidf_cutoff”：0.001，
用于术语freq/doc freq score的截止值
“pos_list”：[“nn”，“nnp”]，
“pos white list”用于筛选候选人从候选项中筛选出
}

欢迎加入QQ群-->： 979659372

Distiller 0.1.2

Distiller的Python项目详细描述

推荐PyPI第三方库

adafruit-circuitpython-mpu6050

adafruit-circuitpython-il0373

NREL-reVX

qcrop

captivit

guess-xsd-type

aiolo

odoo10-addon-product-secondary-unit

rul

faker-scifi

defKe

whichtok

pychromatic

Flask-Neo4j4

cmyui

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

Distiller 0.1.2

Distiller的Python项目详细描述

推荐PyPI第三方库

adafruit-circuitpython-mpu6050

adafruit-circuitpython-il0373

NREL-reVX

qcrop

captivit

guess-xsd-type

aiolo

odoo10-addon-product-secondary-unit

rul

faker-scifi

defKe

whichtok

pychromatic

Flask-Neo4j4

cmyui

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签