方便了深度文本匹配模型的设计、比较和共享。
MatchZoo-test的Python项目详细描述
matchzoo py
PyTorch version of MatchZoo.
Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。
matchzoo的目标是为深入的文本匹配研究提供一个高质量的代码库,如文档检索、问答、会话响应排序和释义识别。采用统一的数据处理流水线,简化了模型配置和自动超调参数的特点,MatCHOCLE灵活灵活,易于使用。
Tasks | Text 1 | Text 2 | Objective |
---|---|---|---|
Paraphrase Indentification | string 1 | string 2 | classification |
Textual Entailment | text | hypothesis | classification |
Question Answer | question | answer | classification/ranking |
Conversation | dialog | response | classification/ranking |
Information Retrieval | query | document | ranking |
60秒后开始
要训练a Deep Semantic Structured Model,请使用matchzoo自定义的损失函数和评估指标来定义任务:
importtorchimportmatchzooasmzranking_task=mz.tasks.Ranking(losses=mz.losses.RankCrossEntropyLoss(num_neg=4))ranking_task.metrics=[mz.metrics.NormalizedDiscountedCumulativeGain(k=3),mz.metrics.MeanAveragePrecision()]
准备输入数据:
train_pack=mz.datasets.wiki_qa.load_data('train',task=ranking_task)valid_pack=mz.datasets.wiki_qa.load_data('dev',task=ranking_task)
用三行代码预处理输入数据,跟踪要传递到模型中的参数:
preprocessor=mz.models.DSSM.get_default_preprocessor()train_processed=preprocessor.fit_transform(train_pack)valid_processed=preprocessor.transform(valid_pack)
动态生成成对训练数据:
trainset=mz.dataloader.Dataset(data_pack=train_processed,mode='pair',num_dup=1,num_neg=4)validset=mz.dataloader.Dataset(data_pack=valid_processed,mode='point')
定义填充回调并生成数据加载器:
padding_callback=mz.models.DSSM.get_default_padding_callback()trainloader=mz.dataloader.DataLoader(dataset=trainset,batch_size=32,stage='train',callback=padding_callback)validloader=mz.dataloader.DataLoader(dataset=validset,batch_size=32,stage='dev',callback=padding_callback)
初始化模型,微调超参数:
model=mz.models.DSSM()model.params['task']=ranking_taskmodel.params['vocab_size']=preprocessor.context['vocab_size']model.guess_and_fill_missing_params()model.build()
Trainer
用于控制培训流程:
optimizer=torch.optim.Adam(model.parameters())trainer=mz.trainers.Trainer(model=model,optimizer=optimizer,trainloader=trainloader,validloader=validloader,epochs=10)trainer.run()
参考文献
如果你对前沿的研究进展感兴趣,请看一下awaresome neural models for semantic match。
安装
matchzoo依赖于PyTorch。安装matchzoo py的两种方法:
install matchzoo py from pypi:
pipinstallmatchzoo-py
从github源安装matchzoo py:
git clone https://github.com/NTMC-Community/MatchZoo-py.git
cd MatchZoo-py
python setup.py install
型号
DRMM:此模型是A Deep Relevance Matching Model for Ad-hoc Retrieval的实现。
DRMMTKS:此模型是A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval的实现。
ARC-I:此模型是Convolutional Neural Network Architectures for Matching Natural Language Sentences
的实现ARC-II:此模型是Convolutional Neural Network Architectures for Matching Natural Language Sentences
的实现DSSM:这个模型是Learning Deep Structured Semantic Models for Web Search using Clickthrough Data的一个实现
CDSSM:这个模型是Learning Semantic Representations Using Convolutional Neural Networks for Web Search
MatchLSTM:这个模型是Machine Comprehension Using Match-LSTM and Answer Pointer的一个实现
DUET:这个模型是Learning to Match Using Local and Distributed Representations of Text for Web Search
KNRM:这个模型是End-to-End Neural Ad-hoc Ranking with Kernel Pooling
ConvKNRM:这个模型是Convolutional neural networks for soft-matching n-grams in ad-hoc search
BiMPM:这个模型是Bilateral Multi-Perspective Matching for Natural Language Sentences
正在开发的模型:MatchPyramid,Match-SRNN,DeepRank,aNMM….
引文
如果您在研究中使用matchzoo,请使用以下bibtex条目。
@inproceedings{Guo:2019:MLP:3331184.3331403,
author = {Guo, Jiafeng and Fan, Yixing and Ji, Xiang and Cheng, Xueqi},
title = {MatchZoo: A Learning, Practicing, and Developing System for Neural Text Matching},
booktitle = {Proceedings of the 42Nd International ACM SIGIR Conference on Research and Development in Information Retrieval},
series = {SIGIR'19},
year = {2019},
isbn = {978-1-4503-6172-9},
location = {Paris, France},
pages = {1297--1300},
numpages = {4},
url = {http://doi.acm.org/10.1145/3331184.3331403},
doi = {10.1145/3331184.3331403},
acmid = {3331403},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {matchzoo, neural network, text matching},
}
开发团队
Yixing Fan Core Dev | Jiangui Chen Core Dev |
Yinqiong Cai Core Dev |
Liang Pang Core Dev |
Lixin Su Dev |
Junfeng Tian Dev |
Qinghua Wang Documentation |
贡献
请确保在创建拉取请求之前阅读Contributing Guide。如果您有与matchzoo相关的paper/project/compent/tool,请向this awesome list发送拉取请求!
感谢所有已经为Matchzoo捐款的人!
Bo Wang,Zeyi Wang,Liu Yang,Zizhen Wang,Zhou Yang,Jianpeng Hou,Lijuan Chen,Yukun Zheng,Niuguo Cheng,Dai Zhuyun,Aneesh Joshi,Zeno Gantner,Kai Huang,stanpcf,ChangQF,Mike Kellogg
项目组织者
许可证
版权(C)2019,宜兴范(Fasimon)