德语文本预处理的小程序包
spacy-german-preprocess的Python项目详细描述
预处理
安装: 该项目使用pipenv来管理依赖项。您可以使用以下命令安装所有要求:
$ pipenv install
$ pipenv shell
$ pipenv run python -m spacy download de
仍然待办事项:
- 编辑停止字列表
- 编辑标记列表
- 可能扩展自定义的元素化json文件(工作量大,输出少?)
此项目使用Spacy-IWNLP元素化:
@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
author = {Liebeck, Matthias and Conrad, Stefan},
title = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
year = {2015},
publisher = {Association for Computational Linguistics},
pages = {414--418},
url = {http://www.aclweb.org/anthology/P15-2068}
}