将folia和tei文件转换为alpino xml文件
corpus2alpino的Python项目详细描述
folia和tei到alpino xml
将FoLiA和TEIxml文件转换为Alpinoxml文件。输入文件中的每个句子都是单独分析的。
用法
命令行
pip install corpus2alpino corpus2alpino -s localhost:7001 folia.xml -o alpino.xml
或来自项目根目录:
python -m corpus2alpino -s localhost:7001 folia.xml -o alpino.xml
库
fromcorpus2alpino.converterimportConverterfromcorpus2alpino.annotators.alpinoimportAlpinoAnnotatorfromcorpus2alpino.collectors.filesystemimportFilesystemCollectorfromcorpus2alpino.targets.memoryimportMemoryTargetfromcorpus2alpino.writers.lassyimportLassyWriteralpino=AlpinoAnnotator("localhost",7001)converter=Converter(FilesystemCollector(["folia.xml"]),# Not needed when using the PaQuWriterannotators=[alpino],# This can also be ConsoleTarget, FilesystemTargettarget=MemoryTarget(),# Set to merge treebanks, also possible to use PaQuWriterwriter=LassyWriter(True))# get the Alpino XML output, combined into one treebank XML fileparses=converter.convert()print(''.join(parses))# <treebank><alpino_ds ... /></treebank>
单元测试
python -m unittest
上传到pypi
见:https://packaging.python.org/tutorials/packaging-projects/#generating-distribution-archives
确保安装了setuptools
和wheel
。然后从virtualenv:
python setup.py sdist bdist_wheel twine upload dist/*
要求
- Alpino parser作为服务器运行:
Alpino batch_command=alpino_server -notk server_port=7001
- Python3.6或更高版本(使用3.6.3开发)。
- libfolia-dev
- libicu-dev
- libxml2-dev
- libticcutils2-dev
- libucto-dev
- ucto
- tqdm