Python ruword-frequenc包_程序模块 - PyPI

库返回几乎所有俄语单词的单词频率（ipm）

ruword-frequenc的Python项目详细描述

说明

python库ruword_frequency返回俄语单词的频率（ipm-items/million），不区分大小写。它基于大量的俄语文档和准备的词频来源。完整列表：

从所有枚举源中提取word的ipm并使用平均值。完整的索引包含了70亿个单词，其中包括来自原始数据源的错误（不幸的是）。

要求：

Python3
word索引在硬盘上占据了将近50mb的空间，并且将在您第一次调用frequency.load()方法时被下载

安装

# TODO

用法

from ruword_frequency import Frequency
freq = Frequency()
freq.load()

freq.ipm('привет')
>>> 53.51823806762695

freq.ipm('неттакогослова')
>>> 0.0

# get max ipm value. For weights normalization, for example
freq.max_ipm()
>>> 42329.2890625

# get list of most used words  with ipm more then 10000
for w in freq.iterate_words(10000):
    print(w)

有关其他有用的方法，请参见marisa-trie文档。树索引可用作freq.tree

自行重建树

from ruword_frequency.source_reader import SourceReader
reader = SourceReader()

# increase socket timeout, sometimes helpful for huge file downloading:
import socket
socket.setdefaulttimeout(60)

reader.download_all_sources()
tree = reader.build_tree_from_dictionaries()
reader.save_tree(tree)

# use it 
freq = Frequency()
freq.ipm('привет')

欢迎加入QQ群-->： 979659372

ruword-frequency 0.0.1

ruword-frequenc的Python项目详细描述

说明

要求：

安装

用法

自行重建树

推荐PyPI第三方库

python-openid-lac

pyliburo

robotframework-zeep

obspyh5

camcrypt

trytond-project-revenue

luhn

ipynb-tests

condorp

captricity-python-client

tod

django-redis-sessions

gdata-python3

apache-parser

swimlane-python-logger

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

ruword-frequency 0.0.1

ruword-frequenc的Python项目详细描述

说明

要求：

安装

用法

自行重建树

推荐PyPI第三方库

python-openid-lac

pyliburo

robotframework-zeep

obspyh5

camcrypt

trytond-project-revenue

luhn

ipynb-tests

condorp

captricity-python-client

tod

django-redis-sessions

gdata-python3

apache-parser

swimlane-python-logger

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签