CLAF:Clova语言框架

claf的Python项目详细描述


克洛瓦语言框架

Documentation StatusCode style: black


claf:clova语言框架

claf是一个建立在pytorch上的语言框架,它提供以下两个高级功能:

  • Experiment通过提供各种方法来控制一般nlp中的训练流。
    • claf的设计灵感来自于AllenNLP的设计原则,如更高层次的概念和可重用的代码,但大多基于pytorch的公共模块,因此用户可以根据自己的需求轻松修改代码。
  • Machine有助于组合各种模块,在一个地方构建nlp机器。
    • 有基于知识的、组件的、经过训练的实验,在模块中推理出一个实例。

目录

安装

要求

  • python 3.6
  • Pythorch=0.4.1
  • MeCab用于朝鲜语标记器
    • sh script/install_mecab.sh

建议使用虚拟环境。
Conda是建立虚拟环境的最简单方法。

conda create -n claf python=3.6
conda activate claf

(claf) ✗ pip install -r requirements.txt

通过PIP安装

通过pip安装的命令

pip install claf

概述

  • multilingual支持建模(目前支持英语和朝鲜语)。
  • 轻量级系统化和模块化。
  • 模型易于扩展和实现。
  • 实验的广泛变化具有可重复和全面的日志记录
  • 提供了诸如“1示例推理延迟”之类的服务度量。
  • 通过组合模块,易于构建nlpmachine

实验

  • 培训流程

images

功能


使用量

培训

images

  1. 仅参数

    python train.py --train_file_path {file_path} --valid_file_path {file_path} --model_name {name} ...
    
  2. 仅baseconfig(跳过/base_config路径)

    python train.py --base_config {base_config}
    
  3. 基本配置+参数

    python train.py --base_config {base_config} --learning_rate 0.002
    
    • 加载baseconfig,然后覆盖learning_rate到0.002

基本配置

声明性实验配置(.json)

  • 只需与对象的参数匹配
  • 存在于{{CD5}}目录中的示例
Base Config:
  --base_config BASE_CONFIG
    Use pre-defined base_config:

    * SQuAD:
    ['squad/bert_large_uncased', 'squad/bidaf', 'squad/drqa_paper', 'squad/drqa', 'squad/bert_base_uncased', 'squad/qanet', 'squad/docqa+elmo', 'squad/bidaf_no_answer', 'squad/docqa_no_answer', 'squad/qanet_paper', 'squad/bidaf+elmo', 'squad/docqa']

    * KorQuAD:
    ['korquad/bidaf', 'korquad/docqa']

    * WikiSQL:
    ['wikisql/sqlnet']

    * CoLA:
    ['cola/bert_large_uncased', 'cola/structured_self_attention']

    * CoNLL 2003:
    ['conll2003/bert_large_cased']

评估

python eval.py <data_path> <model_checkpoint_path>
  • 示例
✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl
...
[INFO] - {
    "valid/loss": 2.59111491665019,
    "valid/epoch_time": 60.7434446811676,
    "valid/start_acc": 63.17880794701987,
    "valid/end_acc": 67.19016083254493,
    "valid/span_acc": 54.45600756859035,
    "valid/em": 68.10785241248817,
    "valid/f1": 77.77963381714842
}
# write predictions files (<log_dir>/predictions/predictions-valid-19.json)
  • 一个示例推理延迟(Summary
✗ python eval.py data/squad/dev-v1.1.json logs/squad/bidaf/checkpoint/model_19.pkl
...
# Evaluate Inference Latency Mode.
...
[INFO] - saved inference_latency results. bidaf-cpu.json  # file_format: {model_name}-{env}.json

预测

python predict.py <model_checkpoint_path> --<arguments>
  • 示例
✗ python predict.py logs/squad/bidaf/checkpoint/model_19.pkl \
    --question "When was the last Super Bowl in California?" \
    --context "On May 21, 2013, NFL owners at their spring meetings in Boston voted and awarded the game to Levi's Stadium. The $1.2 billion stadium opened in 2014. It is the first Super Bowl held in the San Francisco Bay Area since Super Bowl XIX in 1985, and the first in California since Super Bowl XXXVII took place in San Diego in 2003."

>>> Predict: {'text': '2003', 'score': 4.1640071868896484}

Docker图像

  • Docker Hub
  • 使用Docker图像运行
    • 拉码头图片 ✗ docker pull claf/claf:latest
    • 跑步 docker run --rm -i -t claf/claf:latest /bin/bash

机器

  • 机器结构

images

用法

  • 定义配置文件(.json),比如machine_config/目录中的BaseConfig
  • 运行claf machine(跳过/machine_config路径)
✗ python machine.py --machine_config {machine_config}
  • 预定义的列表Machine
Machine Config:
  --machine_config MACHINE_CONFIG
    Use pre-defined machine_config (.json (.json))

    ['ko_wiki', 'nlu']

打开QA(DRQA)

drqa是一个应用于开放领域问答的阅读理解系统。该系统必须将文档检索(查找相关文档)和文本机器理解(从这些文档中识别答案)的挑战结合起来。

  • ko_wiki:韩文wiki版本
✗ python machine.py --machine_config ko_wiki
...
Completed!
Question > 동학의 2대 교주 이름은?
--------------------------------------------------
Doc Scores:
 - 교주 : 0.5347289443016052
 - 이교주 : 0.4967213571071625
 - 교주도 : 0.49036136269569397
 - 동학 : 0.4800325632095337
 - 동학중학교 : 0.4352934956550598
--------------------------------------------------
Answer: [
    {
        "text": "최시형",
        "score": 11.073444366455078
    },
    {
        "text": "충주목",
        "score": 9.443866729736328
    },
    {
        "text": "반월동",
        "score": 9.37778091430664
    },
    {
        "text": "환조 이자춘",
        "score": 4.64817476272583
    },
    {
        "text": "합포군",
        "score": 3.3186707496643066
    }
]

NLU(去离子alog)

✗ python machine.py --machine_config nlu
...
Utterance > "looking for a flight from Boston to Seoul or Incheon"

NLU Result: {
    "intent": "flight",
    "slots": {
        "city.depart": ["Boston"],
        "city.dest": ["Seoul", "Incheon"]
    }
}

贡献

感谢您对贡献的兴趣!有很多方法可以为这个项目做出贡献。
开始here

维护人员

claf当前由

引用

如果您在工作中使用claf,请引用:

@misc{CL,author={Lee, Dongjun and Yang, Sohee and Kim, Minjeong},title={CLaF: Open-Source Clova Language Framework},year={2019},publisher={GitHub},journal={GitHub repository},howpublished={\url{https://github.com/naver/claf}}}

我们将用我们的报纸更新这个bibtex。

致谢

docs/目录,其中包含由Sphinx创建的文档。

许可证

麻省理工学院许可证

Copyright (c) 2019-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy 
of this software and associated documentation files (the "Software"), to deal 
in the Software without restriction, including without limitation the rights 
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 
copies of the Software, and to permit persons to whom the Software is 
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all 
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 
SOFTWARE.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
任务“:app:compiledBugJava”与“;”的java Android执行失败预期   cmd Java找不到主类   应用程序关闭后java还原Android回栈   无法识别Java中通过OutputStream发送到Firefox的HTML代码   Redis hmget的java超时时间   排序java如何生成按字母顺序排序集合的泛型函数   java从何处提取默认编码?   必须指定java Spring引导安全配置authenticationManager   创建目录的java单元测试代码   初始化期间的java空变量   java如何通过同一个交换机编码启用和禁用Wifi、移动数据和GPS?   java将顺序UI作业排队并在UI中显示它们   java JMS连接未连接到远程JBoss,但连接本地实例   swing Java多GUI窗口创建   包装jar的java OSGi服务   java意外字符“ï”,使用Jackson解析JSON