Python mafan包_程序模块 - PyPI

用python处理中文的工具箱

mafan的Python项目详细描述

[！[构建状态]（https://travis-ci.org/hermanschaaf/mafan.svg？branch=master）"（https://travis ci.org/hermanschaaf/mafan）

==
mafan-使用python中的中文的工具包
==

包含在这里的是一个不断增长的松散相关工具的集合，分解为几个文件。这些是：

installation
==

encodings
==

"encodings"包含将文件从任意数量的字符编码转换为更合理的编码（默认为utf-8）的函数。例如：

``python
from mafan import encoding

encoding.convert（filename）创建一个名为'ugh砦big5砦utf-8.txt'的光荣utf-8编码文件包含一些用于处理字符串的函数。比如检测字符串中的英语、字符串是否有中文标点符号等等。查看"text.py"了解最新的优点。它还包含一个方便的包，用于Jianfan包在简化和传统之间进行转换：

``python
>；>；来自mafan import simplify，tradify
>；>；string=u'36825;是麻烦啦'
>；>；打印tradify（string）>；将string转换为传统
這是麻煩啦
>；>print simplify（tradify（string））convert back to simplified
>>gt>gt；来自Mafan导入文本
>>gt；text.has_标点符号（u'36825;这是烦烦烦烦啦'）检查是否有任何中文标点符号（句点、逗号、引号等）
>false
>>gt>gt；text.has_标点符号（u'u'u'36825;这这这这这这这这这这这u标点符号
true
>text.contains拉丁语（u'un36825;啦。'）
>false
>gt>text.contains拉丁语（u'you are麻拉丁语（u'you are麻烦_烦啦。'）
>true
>true

您还可以测试句子或文档是否使用简简简字符、繁体字符，两者或两者都使用或两者都不使用，或者两者都使用简体字符和繁体字符：>
``python
>>>gt>gt；import mafan
>>>gt>gt；from-mafan；from-mafan；导入mafan
>>>gt>gt>gt；从mafan导入文本
>>gt；文本。是简化的（u'36825；u'36635；u'36825；u'36905；on'）
>true
>true
>true；文本。是传统的（u'hello；u'hello；u'hello；u'26889；
>>true
>>true；文本。是传统的（u'36br/>>>gt>gt>gt>gt；文本。是传统的（br/>
或者用另一种方式：
>；。>；>text.identify（u'这是麻烦啦'）是mafan。simplified
true
>；>text.identify（u'這是_煩啦'）是mafan。traditional
true
>；>text.identify（u'这是麻烦啦！這是麻煩啦'）是mafan。两者都是mafan。identify（u'this is so mafan.'）是mafan。既不是mafan，也不是none
true
````

identify功能作为一个非常薄的包装引入到thomas roten的[hanzidentifier]中（https://github.com/tsroten/hanzidentifier），而ch是mafan的一部分。

mafan中预先内置的另一个函数是"split掼text"，它将中文句子标记为单词：

``python
>；>from mafan import split掼text
>；>split掼text（u"36889;是是煩啦"）
[u'\u9019'，u'\u662f'，u'\u9ebb\u7169'，u'\u5566']
>>gt；gt；gt；gt；gt；gt；gt；打印''join（拆分文本（u）join）（联合（拆分文本）（u）join）

您还可以选择传递boolealean参数include part-u-of-speech参数，以获取标记单词：

``python
>>>；gt；gt；gt；gt；gt；gt；gt；gt；gt；gt；gt；打印''join（拆分文本（拆分文本（u）join）

<拆分文本（u"這是麻煩啦"，inc.lude_part_of_speech=true）
[（u'\u9019'，'r'），（u'\u662f'，'v'），（u'\u9ebb\u7169'，'x'），（u'\u5566'，'y'）]
`````

拼音
===

`拼音'包含使用或在拼音之间转换的功能。目前，只有一个函数可以将数字拼音转换为正确的T一个标记。例如：

```python
>；>from mafan import pinyin
>；>print pinyin.decode（"ni3hao3"）
n_h_o
```

选项：

-将环境变量"mafan_dictionary_path"设置为此[字典文件]的本地副本的绝对路径（https://github.com/fxsjy/jieba/raw/master/extra_dict/dict.txt.big），
-或安装"mafan_traditional"便利包：`pip install mafan_t传统的。如果此软件包已安装并可用，mafan将默认使用此扩展字典文件。

撰稿人：
——
*herman schaaf（[ironzebra.com]（http://www.ironzebra.com））（作者）
*thomas roten（[github]（https://github.com/tsroten/））
*[joewonglvfs]（https://github.com/joewonglvfs）
*casper cy ching（[github]（https://github.com/cycching））

任何贡献都是非常欢迎的！

使用此功能的网站：
——
*[chineselevel.com]（http://www.chineselevel.com）

欢迎加入QQ群-->： 979659372

mafan 0.3.1

mafan的Python项目详细描述

推荐PyPI第三方库

odoo12-addons-oca-connector-cmis

trunity-migrator

odoo9-addons-oca-account-financial-tools

jerrypackage

asynqp-consumer

titanicsp

textgain

shwirl

dsreports

djangodebugpanel

ecgclock

Foodle

clapton

hconf

scrapydartx

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

mafan 0.3.1

mafan的Python项目详细描述

推荐PyPI第三方库

odoo12-addons-oca-connector-cmis

trunity-migrator

odoo9-addons-oca-account-financial-tools

jerrypackage

asynqp-consumer

titanicsp

textgain

shwirl

dsreports

djangodebugpanel

ecgclock

Foodle

clapton

hconf

scrapydartx

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签