Python lexical-diversit包_程序模块 - PyPI

计算词汇多样性的简单程序

lexical-diversit的Python项目详细描述

使用pip安装：

pip install lexical-diversity

开始：

>>> from lexical_diversity import lex_div as ld

预处理文本：

为了方便起见，用户可以使用tokenize（）函数或通过使用预定义的tokenize函数（例如，来自nltk）来标记文本：

>>> text = """The state was named for the Colorado River, which Spanish travelers named the Río Colorado for the ruddy silt the river carried from the mountains. The Territory of Colorado was organized on February 28, 1861, and on August 1, 1876, U.S. President Ulysses S. Grant signed Proclamation 230 admitting Colorado to the Union as the 38th state. Colorado is nicknamed the "Centennial State" because it became a state a century after the signing of the United States Declaration of Independence. Colorado is bordered by Wyoming to the north, Nebraska to the northeast, Kansas to the east, Oklahoma to the southeast, New Mexico to the south, Utah to the west, and touches Arizona to the southwest at the Four Corners. Colorado is noted for its vivid landscape of mountains, forests, high plains, mesas, canyons, plateaus, rivers, and desert lands. Colorado is part of the western or southwestern United States, and one of the Mountain States. Denver is the capital and most populous city of Colorado. Residents of the state are known as Coloradans, although the antiquated term "Coloradoan" is occasionally used."""

>>> tok = ld.tokenize(text)
>>> print(tok[:10])
['the', 'state', 'was', 'named', 'for', 'the', 'colorado', 'river', 'which', 'spanish']

为了方便起见，您还可以使用简单的flemmatize（）函数对文本进行柠檬化，该函数不是特定于语音的部分（“run”作为名词，而“run”作为动词被视为同一个词）。但是，最好使用对词性敏感的柠檬酸盐（例如，使用spacy）。

>>> flt = ld.flemmatize(text)
>>> print(flt[:10])
['the', 'state', 'be', 'name', 'for', 'the', 'colorado', 'river', 'which', 'spanish']

计算词汇多样性

简单ttr

>>> ld.ttr(flt)
0.5777777777777777

根ttr

>>> ld.root_ttr(flt)
7.751702321999271

记录ttr

>>> ld.log_ttr(flt)
0.8943634681549878

质量ttr

>>> ld.maas_ttr(flt)
0.04683980831849556

平均节段ttr（msttr）

默认情况下，段大小为50字。但是，这可以使用window_length参数定制。

>>> ld.msttr(flt)
0.7133333333333333

>>> ld.msttr(flt,window_length=25)
0.7885714285714285

移动平均ttr（mattr）

默认情况下，窗口大小为50字。但是，这可以使用window_length参数定制。

>>> ld.mattr(flt)
0.7206106870229007

>>> ld.mattr(flt,window_length=25)
0.7961538461538458

超几何分布d（hdd）

根据McCarthy和Jarvis（2007年和2010年），VOCD的更直接和可靠的实现（Malvern、Richards、Chipere和Duran，2004年）。

>>> ld.hdd(flt)
0.7346993253061275

H3>词汇语篇多样性（MTLD）< EH3>

根据麦卡锡和贾维斯（2010）计算MTLD。

ld.mtld(flt)
36.50595044690307

H3>词汇语篇多样性测度（移动平均数，包络）< EH3>

使用移动窗口方法计算mtld。它不计算分项系数，而是换行到文本的开头，以完成最后一个因子。

ld.mtld_ma_wrap(flt)
33.68333333333333

H3>词汇语篇多样性测度（移动平均、双向）< EH3>

通过使用移动窗口方法计算每个方向的平均mtld分数。

ld.mtld_ma_bid(flt)
35.46626265150569

欢迎加入QQ群-->： 979659372

lexical-diversity 0.1.0

lexical-diversit的Python项目详细描述

使用pip安装：

开始：

预处理文本：

简单ttr

根ttr

记录ttr

质量ttr

平均节段ttr（msttr）

移动平均ttr（mattr）

超几何分布d（hdd）

推荐PyPI第三方库

django-aps-process

Dropa-bdelucca

datasette-json-html

odoo8-addon-web-environment-ribbon

django-fsmedhro-core

pympv

hanziconv

mp3-tagger

python-udptrack

django-bootstrap-components

timeflow

openquake.engine

nimbus-chart

mistral-dashboard

pycopy-sched

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

lexical-diversity 0.1.0

lexical-diversit的Python项目详细描述

使用pip安装：

开始：

预处理文本：

简单ttr

根ttr

记录ttr

质量ttr

平均节段ttr（msttr）

移动平均ttr（mattr）

超几何分布d（hdd）

推荐PyPI第三方库

django-aps-process

Dropa-bdelucca

datasette-json-html

odoo8-addon-web-environment-ribbon

django-fsmedhro-core

pympv

hanziconv

mp3-tagger

python-udptrack

django-bootstrap-components

timeflow

openquake.engine

nimbus-chart

mistral-dashboard

pycopy-sched

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签