CLAF:Clova语言框架
claf的Python项目详细描述
克洛瓦语言框架
claf:clova语言框架
claf是一个建立在pytorch上的语言框架,它提供以下两个高级功能:
Experiment
通过提供各种方法来控制一般nlp中的训练流。- claf的设计灵感来自于AllenNLP的设计原则,如更高层次的概念和可重用的代码,但大多基于pytorch的公共模块,因此用户可以根据自己的需求轻松修改代码。
Machine
有助于组合各种模块,在一个地方构建nlp机器。- 有基于知识的、组件的、经过训练的实验,在模块中推理出一个实例。
目录
安装
要求
- python 3.6
- Pythorch=0.4.1
- MeCab用于朝鲜语标记器
sh script/install_mecab.sh
建议使用虚拟环境。
Conda是建立虚拟环境的最简单方法。
conda create -n claf python=3.6
conda activate claf
(claf) ✗ pip install -r requirements.txt
通过PIP安装
通过pip安装的命令
pip install claf
概述
- multilingual支持建模(目前支持英语和朝鲜语)。
- 轻量级系统化和模块化。
- 模型易于扩展和实现。
- 实验的广泛变化具有可重复和全面的日志记录
- 提供了诸如“1示例推理延迟”之类的服务度量。
- 通过组合模块,易于构建nlpmachine。
实验
- 培训流程
功能
使用量
培训
仅参数
python train.py --train_file_path {file_path} --valid_file_path {file_path} --model_name {name} ...
仅baseconfig(跳过
/base_config
路径)python train.py --base_config {base_config}
基本配置+参数
python train.py --base_config {base_config} --learning_rate 0.002
- 加载baseconfig,然后覆盖
learning_rate
到0.002
- 加载baseconfig,然后覆盖
基本配置
声明性实验配置(.json)
- 只需与对象的参数匹配
Base Config:
--base_config BASE_CONFIG
Use pre-defined base_config:
* SQuAD:
['squad/bert_large_uncased', 'squad/bidaf', 'squad/drqa_paper', 'squad/drqa', 'squad/bert_base_uncased', 'squad/qanet', 'squad/docqa+elmo', 'squad/bidaf_no_answer', 'squad/docqa_no_answer', 'squad/qanet_paper', 'squad/bidaf+elmo', 'squad/docqa']
* KorQuAD:
['korquad/bidaf', 'korquad/docqa']
* WikiSQL:
['wikisql/sqlnet']
* CoLA:
['cola/bert_large_uncased', 'cola/structured_self_attention']
* CoNLL 2003:
['conll2003/bert_large_cased']
评估
python eval.py <data_path> <model_checkpoint_path>
- 示例
✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl
...
[INFO] - {
"valid/loss": 2.59111491665019,
"valid/epoch_time": 60.7434446811676,
"valid/start_acc": 63.17880794701987,
"valid/end_acc": 67.19016083254493,
"valid/span_acc": 54.45600756859035,
"valid/em": 68.10785241248817,
"valid/f1": 77.77963381714842
}
# write predictions files (<log_dir>/predictions/predictions-valid-19.json)
- 一个示例推理延迟(Summary)
✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl
...
# Evaluate Inference Latency Mode.
...
[INFO] - saved inference_latency results. bidaf-cpu.json # file_format: {model_name}-{env}.json
预测
python predict.py <model_checkpoint_path> --<arguments>
- 示例
✗ python predict.py logs/squad/bidaf/checkpoint/model_19.pkl \
--question "When was the last Super Bowl in California?" \
--context "On May 21, 2013, NFL owners at their spring meetings in Boston voted and awarded the game to Levi's Stadium. The $1.2 billion stadium opened in 2014. It is the first Super Bowl held in the San Francisco Bay Area since Super Bowl XIX in 1985, and the first in California since Super Bowl XXXVII took place in San Diego in 2003."
>>> Predict: {'text': '2003', 'score': 4.1640071868896484}
Docker图像
- Docker Hub
- 使用Docker图像运行
- 拉码头图片
✗ docker pull claf/claf:latest
- 跑步
docker run --rm -i -t claf/claf:latest /bin/bash
- 拉码头图片
机器
- 机器结构
用法
- 定义配置文件(.json),比如
machine_config/
目录中的BaseConfig - 运行claf machine(跳过
/machine_config
路径)
✗ python machine.py --machine_config {machine_config}
- 预定义的列表
Machine
:
Machine Config:
--machine_config MACHINE_CONFIG
Use pre-defined machine_config (.json (.json))
['ko_wiki', 'nlu']
打开QA(DRQA)
drqa是一个应用于开放领域问答的阅读理解系统。该系统必须将文档检索(查找相关文档)和文本机器理解(从这些文档中识别答案)的挑战结合起来。
- ko_wiki:韩文wiki版本
✗ python machine.py --machine_config ko_wiki
...
Completed!
Question > 동학의 2대 교주 이름은?
--------------------------------------------------
Doc Scores:
- 교주 : 0.5347289443016052
- 이교주 : 0.4967213571071625
- 교주도 : 0.49036136269569397
- 동학 : 0.4800325632095337
- 동학중학교 : 0.4352934956550598
--------------------------------------------------
Answer: [
{
"text": "최시형",
"score": 11.073444366455078
},
{
"text": "충주목",
"score": 9.443866729736328
},
{
"text": "반월동",
"score": 9.37778091430664
},
{
"text": "환조 이자춘",
"score": 4.64817476272583
},
{
"text": "합포군",
"score": 3.3186707496643066
}
]
NLU(去离子alog)
✗ python machine.py --machine_config nlu
...
Utterance > "looking for a flight from Boston to Seoul or Incheon"
NLU Result: {
"intent": "flight",
"slots": {
"city.depart": ["Boston"],
"city.dest": ["Seoul", "Incheon"]
}
}
贡献
感谢您对贡献的兴趣!有很多方法可以为这个项目做出贡献。
开始here。
维护人员
claf当前由
引用
如果您在工作中使用claf,请引用:
@misc{CL,author={Lee, Dongjun and Yang, Sohee and Kim, Minjeong},title={CLaF: Open-Source Clova Language Framework},year={2019},publisher={GitHub},journal={GitHub repository},howpublished={\url{https://github.com/naver/claf}}}
我们将用我们的报纸更新这个bibtex。
致谢
docs/
目录,其中包含由Sphinx创建的文档。
许可证
麻省理工学院许可证
Copyright (c) 2019-present NAVER Corp.
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.