Python chirptext包_程序模块 - PyPI

chirptext是python的一组文本处理工具。

chirptext的Python项目详细描述

chirptext是python的一组文本处理工具。它并不像流行的ntlk那样是一个强大的坦克，而是一个小软件包，你可以在任何地方安装pip并编写几行代码来处理文本数据。

主要功能

[new]在windows上使用mecab/deko不需要mecab-python3包。只需要二进制版本（mecab.exe）。
文本注释框架（TTL，也称为texttaglib格式），可导入/导出JSON或可读文本文件
帮助函数和处理英语、日语、汉语和越南语的有用数据。
快速生成基于文本的报告
应用程序配置文件管理，可以准确猜测配置文件的位置
具有负责任的网络抓取道德规范的网络抓取器（支持即时缓存）
csv帮助函数
控制台应用程序模板

项目主页：https://github.com/letuananh/chirptext

安装

pip install chirptext
# pip script sometimes doesn't work properly, so you may want to try this instead
python3 -m pip install chirptext

note：chirptext库不再支持python 2。请更新到python 3以使用此包。

示例代码

在Windows上使用Mecab

您可以从http://taku910.github.io/mecab/#download下载mecab二进制软件包并安装它。安装后，您可以尝试：

>>>fromchirptextimportdeko>>>sent=deko.parse('猫が好きです。')>>>sent.tokens[[猫(名詞-一般/*/*|猫|ネコ|ネコ)],[が(助詞-格助詞/一般/*|が|ガ|ガ)],[好き(名詞-形容動詞語幹/*/*|好き|スキ|スキ)],[です(助動詞-*/*/*|です|デス|デス)],[。(記号-句点/*/*|。|。|。)],[EOS(-//|||)]]>>>sent.words['猫','が','好き','です','。']>>>sent[0].pos'名詞'>>>sent[0].root'猫'>>>sent[0].reading'ネコ'

如果将mecab安装到自定义位置，例如C:\mecab\bin\mecab.exe，请尝试

>>>deko.set_mecab_bin("C:\\mecab\\bin\\mecab.exe")>>>deko.get_mecab_bin()'C:\\mecab\\bin\\mecab.exe'# Just that & now you can use mecab>>>deko.parse('雨が降る。').words['雨','が','降る','。']

方便的IO API

>>>fromchirptextimportchio>>>chio.write_tsv('data/test.tsv',[['a','b'],['c','d']])>>>chio.read_tsv('data/tes.tsv')[['a','b'],['c','d']]>>>chio.write_file('data/content.tar.gz','Support writing to .tar.gz file')>>>chio.read_file('data/content.tar.gz')'Support writing to .tar.gz file'>>>forrowinchio.read_tsv_iter('data/test.tsv'):...print(row)...['a','b']['c','d']

网络抓取器

fromchirptextimportWebHelperweb=WebHelper('~/tmp/webcache.db')data=web.fetch('https://letuananh.github.io/test/data.json')data>>>b'{ "name": "Kungfu Panda" }\n'data_json=web.fetch_json('https://letuananh.github.io/test/data.json')data_json>>>{'name':'Kungfu Panda'}

使用计数器

fromchirptextimportCounter,TextReportfromchirptext.leutileimportLOREM_IPSUMct=Counter()vc=Counter()# vowel counterforcharinLOREM_IPSUM:ifchar==' ':continuect.count(char)vc.count("Letters")ifcharin'auieo':vc.count("Vowels")else:vc.count("Consonants")vc.summarise()ct.summarise(byfreq=True,limit=5)

输出

Letters: 377 
Consonants: 212 
Vowels: 165 
i: 42 
e: 37 
t: 32 
o: 29 
a: 29

示例文本报告

# a string reportrp=TextReport()# by default, TextReport will write to standard output, i.e. terminalrp=TextReport(TextReport.STDOUT)# same as aboverp=TextReport('~/tmp/my-report.txt')# output to a filerp=TextReport.null()# ouptut to /dev/null, i.e. nowhererp=TextReport.string()# output to a string. Call rp.content() to get the stringrp=TextReport(TextReport.STRINGIO)# same as above# TextReport will close the output stream automatically by using the with statementwithTextReport.string()asrp:rp.header("Lorem Ipsum Analysis",level="h0")rp.header("Raw",level="h1")rp.print(LOREM_IPSUM)rp.header("Top 5 most common letters")ct.summarise(report=rp,limit=5)print(rp.content())

输出

+---------------------------------------------------------------------------------- 
| Lorem Ipsum Analysis 
+---------------------------------------------------------------------------------- 
 
Raw 
------------------------------------------------------------ 
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. 
 
Top 5 most common letters
------------------------------------------------------------ 
i: 42 
e: 37 
t: 32 
o: 29 
a: 29

欢迎加入QQ群-->： 979659372

chirptext 0.1a18

chirptext的Python项目详细描述

主要功能

安装

示例代码

在Windows上使用Mecab

方便的IO API

网络抓取器

使用计数器

输出

示例文本报告

输出

推荐PyPI第三方库

django-steam

python-cim

Grammaticomastix

WSGIserver

grab-screen

uofl.dztheme.simplesite

feedgen

maildump

batchbook-python

pyknp

aws-profile-gpg

xmla

yaycl-crypt

flaskbb-plugin-vanit

uninhibited

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

chirptext 0.1a18

chirptext的Python项目详细描述

主要功能

安装

示例代码

在Windows上使用Mecab

方便的IO API

网络抓取器

使用计数器

输出

示例文本报告

输出

推荐PyPI第三方库

django-steam

python-cim

Grammaticomastix

WSGIserver

grab-screen

uofl.dztheme.simplesite

feedgen

maildump

batchbook-python

pyknp

aws-profile-gpg

xmla

yaycl-crypt

flaskbb-plugin-vanit

uninhibited

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签