Python DeBERTa包_程序模块 - PyPI

解纠缠注意下的增强BERT译码

DeBERTa的Python项目详细描述

DeBERTa：用分离注意力解码增强型BERT

这个存储库是DeBERTa: Decoding-enhanced BERT with Disentangled Attention 的正式实现

新闻

2020年6月13日

我们发布了预先训练的模型、源代码和微调脚本，以重现本文中的一些实验结果。您可以按照类似的脚本将DeBERTa应用到您自己的实验或应用程序中。下一步将发布预培训脚本。在

德贝塔简介

DeBERTa（DeBERTa）利用两种新的技术改进了BERT和RoBERTa模型。第一种是分离注意机制，每个单词用两个向量表示，分别对其内容和位置进行编码，并使用对其内容和相对位置的解纠缠矩阵计算单词之间的注意权重。第二，用一个增强的掩码解码器代替输出的softmax层来预测模型预训练时的屏蔽令牌。结果表明，这两种技术显著提高了模型预训练的效率和下游任务的性能。在

预先培训的模型

我们预先培训过的模型被打包成压缩文件。您可以从我们的releasements下载它们，或通过以下链接下载单个模型：

Large：预先训练好的大模型
Base：预先训练的基本模型
Large MNLI：使用MNLI任务进行微调的大型模型
Base MNLI：使用MNLI任务微调基本模型

试试代码

阅读我们的documentation

要求

Linux系统，例如Ubuntu 18.04LTS
CUDA 10.0版
Pythorch 1.3.0版
python 3.6
bashshell4.0
卷曲
docker（可选）
nvidia-DOCKR2（可选）

有几种方法可以尝试我们的代码

使用docker

Docker是运行代码的推荐方法，因为我们已经将每个依赖项构建到Docker bagai/deberta中，您可以按照docker official site在您的机器上安装Docker。在

要使用docker运行，请确保您的系统满足上述列表中的要求。下面是尝试粘合实验的步骤：提取代码，运行./run_docker.sh ，然后可以在/DeBERTa/experiments/glue/下运行bash命令

使用pip

拉出代码并在代码的根目录中运行pip3 install -r requirements.txt，然后进入代码的experiments/glue/文件夹，并尝试该文件夹下的bash命令进行粘合实验。在

作为pip包安装

pip install deberta

在现有代码中使用DeBERTa

# To apply DeBERTa into your existing code, you need to make two changes on your code,# 1. change your model to consume DeBERTa as the encoderfromDeBERTaimportdebertaimporttorchclassMyModel(torch.nn.Module):def__init__(self):super().__init__()# Your existing model codeself.bert=deberta.DeBERTa(pre_trained='base')# Or 'large' or 'base_mnli' or 'large_mnli'# Your existing model code# do inilization as before# self.bert.apply_state()# Apply the pre-trained model of DeBERTa at the end of the constructor#defforward(self,input_ids):# The inputs to DeBERTa forward are# `input_ids`: a torch.LongTensor of shape [batch_size, sequence_length] with the word token indices in the vocabulary# `token_type_ids`: an optional torch.LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. #    Type 0 corresponds to a `sentence A` and type 1 corresponds to a `sentence B` token (see BERT paper for more details).# `attention_mask`: an optional parameter for input mask or attention mask. #   - If it's an input mask, then it will be torch.LongTensor of shape [batch_size, sequence_length] with indices selected in [0, 1]. #      It's a mask to be used if the input sequence length is smaller than the max input sequence length in the current batch. #      It's the mask that we typically use for attention when a batch has varying length sentences.#   - If it's an attention mask then if will be torch.LongTensor of shape [batch_size, sequence_length, sequence_length]. #      In this case, it's a mask indicate which tokens in the sequence should be attended by other tokens in the sequence. # `output_all_encoded_layers`: whether to output results of all encoder layers, default, Trueencoding=self.bert(input_ids)[-1]# 2. Change your tokenizer with the the tokenizer built in DeBERtafromDeBERTaimportdebertatokenizer=deberta.GPT2Tokenizer()# We apply the same schema of special tokens as BERT, e.g. [CLS], [SEP], [MASK]max_seq_len=512tokens=tokenizer.tokenize('Examples input text of DeBERTa')# Truncate long sequencetokens=tokens[:max_seq_len]# Add special tokens to the `tokens`tokens=['[CLS]']+tokens+['[SEP]']input_ids=tokenizer.convert_tokens_to_ids(tokens)input_mask=[1]*len(input_ids)# paddingpaddings=max_seq_len-len(input_ids)input_ids=input_ids+[0]*paddingsinput_mask=input_mask+[0]*paddingsfeatures={'input_ids':torch.tensor(input_ids,dtype=torch.int),'input_mask':torch.tensor(input_mask,dtype=torch.int)}

从命令行运行DeBERTa实验

对于胶水任务

获取数据

^{pr2}$

运行任务

task=STS-B 
OUTPUT=/tmp/DeBERTa/exps/$taskexportOMP_NUM_THREADS=1
python3 -m DeBERTa.apps.train --task_name $task --do_train  \
  --data_dir $cache_dir/glue_tasks/$task\
  --eval_batch_size 128\
  --predict_batch_size 128\
  --output_dir $OUTPUT\
  --scale_steps 250\
  --loss_scale 16384\
  --accumulative_update 1\  
  --num_train_epochs 6\
  --warmup 100\
  --learning_rate 2e-5 \
  --train_batch_size 32\
  --max_seq_len 128

重要注意事项

要在多个gpu上运行我们的代码，您必须在午餐之前OMP_NUM_THREADS=1我们的培训代码
默认情况下，我们将在$HOME/.~DeBERTa缓存预先训练的模型和标记器，如果下载意外失败，您可能需要清理它。在

实验

我们的微调实验是在带有8x32 V100 GPU卡的DGX-2节点上进行的，由于GPU型号、驱动程序、CUDA SDK版本、使用FP16或FP32以及随机种子，结果可能会有所不同。我们报告我们的数字是基于不同随机种子的multple运行。以下是大模型的结果：

Task	Command	Results	Running Time(8x32G V100 GPUs)
MNLI large	^{}	91.2/91.0 +/-0.1	2.5h
QQP large	^{}	92.3 +/-0.1	6h
QNLI large	^{}	95.3 +/-0.2	2h
MRPC large	^{}	93.4 +/-0.5	0.5h
RTE large	^{}	87.7 +/-1.0	0.5h
SST-2 large	^{}	96.7 +/-0.3	1h
STS-b large	^{}	92.5 +/-0.3	0.5h
CoLA large	^{}	70.5 +/-1.0	0.5h

下面是基本模型的结果

^{tb2}$

联系人

何鹏程（penhe@microsoft.com）、刘晓东（xiaodl@microsoft.com）、高剑峰（jfgao@microsoft.com）、陈伟柱（wzchen@microsoft.com）

引文

@misc{he2020deberta,
    title={DeBERTa: Decoding-enhanced BERT with Disentangled Attention},
    author={Pengcheng He and Xiaodong Liu and Jianfeng Gao and Weizhu Chen},
    year={2020},
    eprint={2006.03654},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

贡献

这个项目欢迎大家的贡献和建议。大多数投稿需要你同意贡献者许可协议（CLA）声明您有权并实际上授予我们使用你的贡献的权利。有关详细信息，请访问https://cla.opensource.microsoft.com。在

当你提交请求时，CLA机器人会自动确定w你是否需要提供一个CLA并适当地装饰PR（例如状态检查、评论）。只需按照说明操作即可由bot提供。您只需使用我们的CLA在所有回购中执行一次。在

这个项目采用了Microsoft Open Source Code of Conduct。有关详细信息，请参阅Code of Conduct FAQ或如有任何其他问题或意见，请联系opencode@microsoft.com。在

git版本：4841fbfdef8e4e169c3b85aacb4290645e320287 日期：2020-08-07 00:11:48.409337

欢迎加入QQ群-->： 979659372

DeBERTa 0.1.8

DeBERTa的Python项目详细描述

DeBERTa：用分离注意力解码增强型BERT

新闻

2020年6月13日

德贝塔简介

预先培训的模型

试试代码

要求

使用docker

使用pip

作为pip包安装

从命令行运行DeBERTa实验

重要注意事项

实验

联系人

引文

贡献

推荐PyPI第三方库

datapunt-authorization-levels

genomeqaml-gui

boxyboi

initrd

volkswagencarnet

django-seeker

JRules

eidos

xapi

onstar

zengine

scaii

helga-spongebob

ESN

helloworldpackage

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

DeBERTa 0.1.8

DeBERTa的Python项目详细描述

DeBERTa：用分离注意力解码增强型BERT

新闻

2020年6月13日

德贝塔简介

预先培训的模型

试试代码

要求

使用docker

使用pip

作为pip包安装

从命令行运行DeBERTa实验

重要注意事项

实验

联系人

引文

贡献

推荐PyPI第三方库

datapunt-authorization-levels

genomeqaml-gui

boxyboi

initrd

volkswagencarnet

django-seeker

JRules

eidos

xapi

onstar

zengine

scaii

helga-spongebob

ESN

helloworldpackage

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签