用于将Unihan数据集构建为数据包/简单数据格式的工具。
cihaidata-unihan的Python项目详细描述
cihaidata unihan-将unihan构建成简单数据格式的工具 CSV格式。是cihai项目的一部分
Unihan的数据分散在多个文件中,格式为:
U+3400 kCantonese jau1 U+3400 kDefinition (same as U+4E18 丘) hillock or mound U+3400 kMandarin qiū U+3401 kCantonese tim2 U+3401 kDefinition to lick; to taste, a mat, bamboo bark U+3401 kHanyuPinyin 10019.020:tiàn U+3401 kMandarin tiàn
cihaidata_unihan/process.py将下载unihan.zip并将所有文件构建到 单一表格CSV(默认输出:./data/unihan.csv):
char,ucn,kCantonese,kDefinition,kHanyuPinyin,kMandarin 丘,U+3400,jau1,(same as U+4E18 丘) hillock or mound,,qiū 㐁,U+3401,tim2,"to lock; to taste, a mat, bamboo bark",10019.020:"tiàn,tiàn"
process.py支持命令行参数。有关如何指定自定义列、文件和 下载URL和输出目标。
根据单元测试构建。见Travis Builds和 Revision History
用法
下载并构建自己的unihan.csv:
$ ./cihaidata_unihan/process.py
创建data/unihan.csv
有关高级用法示例,请参见cihaidata_unihan/process.py CLI arguments。
结构
# dataset metadata, schema information. datapackage.json # (future) when this package is stable, unihan.csv will be provided data/unihan.csv # stores downloaded Unihan.zip and it's txt file contents (.gitignore'd) data/build_files/ # script to download + build a SDF csv of unihan. cihaidata_unihan/process.py # unit tests to verify behavior / consistency of builder tests/* # python 2/3 compatibility modules cihaidata_unihan/_compat.py cihaidata_unihan/unicodecsv.py # python module, public-facing python API. __init__.py cihaidata_unihan/__init__.py # utility / helper functions cihaidata_unihan/util.py
cihai是not所需的:
- data/unihan.csv-simple data format兼容的csv文件。
- cihaidata_unihan/process.py-创建一个data/unihan.csv。
当此模块稳定时,data/unihan.csv将准备好 发布,不需要使用cihaidata_unihan/process.py。process.py 不需要外部库
示例
相关链接:
Python support | Python 2.7, >= 3.3, pypy/pypy3 |
Source | https://github.com/cihai/cihaidata-unihan |
Docs | https://cihaidata-unihan.git-pull.com |
Changelog | https://cihaidata-unihan.git-pull.com/en/latest/history.html |
API | https://cihaidata-unihan.git-pull.com/en/latest/api.html |
Issues | https://github.com/cihai/cihaidata-unihan/issues |
Travis | https://travis-ci.org/cihai/cihaidata-unihan |
Test coverage | https://codecov.io/gh/cihai/cihaidata-unihan |
pypi | https://pypi.python.org/pypi/cihaidata-unihan |
OpenHub | https://www.openhub.net/p/cihaidata-unihan |
License | MIT. |
git repo | ^{pr 5}$ |
install dev | ^{pr 6}$ |
tests | ^{pr 7}$ |