有利于深度文本匹配模型的设计、比较和共享。
MatchZoo的Python项目详细描述
火柴动物园
Facilitating the design, comparison and sharing of deep text matching models.
MatchZoo 是一个通用的文本匹配工具包,它旨在方便大家快速的实现、比较、以及分享最新的深度文本匹配模型。
matchzoo的目标是为深入的文本匹配研究提供一个高质量的代码库,如文档检索、问答、会话响应排序和释义识别。采用统一的数据处理流水线,简化了模型配置和自动超调参数的特点,MatCHOCLE灵活灵活,易于使用。
Tasks | Text 1 | Text 2 | Objective |
---|---|---|---|
Paraphrase Indentification | string 1 | string 2 | classification |
Textual Entailment | text | hypothesis | classification |
Question Answer | question | answer | classification/ranking |
Conversation | dialog | response | classification/ranking |
Information Retrieval | query | document | ranking |
60秒后开始
要训练a Deep Semantic Structured Model,请导入matchzoo并准备输入数据。
importmatchzooasmztrain_pack=mz.datasets.wiki_qa.load_data('train',task='ranking')valid_pack=mz.datasets.wiki_qa.load_data('dev',task='ranking')predict_pack=mz.datasets.wiki_qa.load_data('test',task='ranking')
用三行代码预处理输入数据,跟踪要传递到模型中的参数。
preprocessor=mz.preprocessors.DSSMPreprocessor()train_processed=preprocessor.fit_transform(train_pack)valid_processed=preprocessor.transform(valid_pack)
使用Matchzoo定制的损失功能和评估指标:
ranking_task=mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss(num_neg=4))ranking_task.metrics=[mz.metrics.NormalizedDiscountedCumulativeGain(k=3),mz.metrics.NormalizedDiscountedCumulativeGain(k=5),mz.metrics.MeanAveragePrecision()]
初始化模型,微调超参数。
model=mz.models.DSSM()model.params['input_shapes']=preprocessor.context['input_shapes']model.params['task']=ranking_taskmodel.params['mlp_num_layers']=3model.params['mlp_num_units']=300model.params['mlp_num_fan_out']=128model.params['mlp_activation_func']='relu'model.guess_and_fill_missing_params()model.build()model.compile()
动态生成成对训练数据,使用对验证数据的自定义回调来评估模型性能。
train_generator=mz.PairDataGenerator(train_processed,num_dup=1,num_neg=4,batch_size=64,shuffle=True)valid_x,valid_y=valid_processed.unpack()evaluate=mz.callbacks.EvaluateAllMetrics(model,x=valid_x,y=valid_y,batch_size=len(pred_x))history=model.fit_generator(train_generator,epochs=20,callbacks=[evaluate],workers=5,use_multiprocessing=False)
参考文献
如果你对前沿的研究进展感兴趣,请看一下awaresome neural models for semantic match。
安装
matchzoo依赖于Keras,请安装其后端引擎之一:tensorflow、theano或cntk。我们推荐TensorFlow后端。安装MatchZoo的两种方法:
install matchzoo from pypi:
pipinstallmatchzoo
从github源安装matchzoo:
git clone https://github.com/NTMC-Community/MatchZoo.git
cd MatchZoo
python setup.py install
型号:
DRMM:此模型是A Deep Relevance Matching Model for Ad-hoc Retrieval的实现。
- 的实现
ARC-I:此模型是Convolutional Neural Network Architectures for Matching Natural Language Sentences的实现
DSSM:这个模型是Learning Deep Structured Semantic Models for Web Search using Clickthrough Data的一个实现
CDSSM:这个模型是Learning Semantic Representations Using Convolutional Neural Networks for Web Search
ARC-II:这个模型是Convolutional Neural Network Architectures for Matching Natural Language Sentences的一个实现
MV-LSTM:这个模型是A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations
aNMM:这个模型是aNMM: Ranking Short Answer Texts with Attention-Based Neural Matching Model
DUET:这个模型是Learning to Match Using Local and Distributed Representations of Text for Web Search
K-NRM:这个模型是End-to-End Neural Ad-hoc Ranking with Kernel Pooling
CONV-KNRM:这个模型是Convolutional neural networks for soft-matching n-grams in ad-hoc search
正在开发的模型:Match-SRNN,DeepRank,BiMPM….
引文
如果您在研究中使用matchzoo,请使用以下bibtex条目。
@article{fan2017matchzoo,
title={Matchzoo: A toolkit for deep text matching},
author={Fan, Yixing and Pang, Liang and Hou, JianPeng and Guo, Jiafeng and Lan, Yanyan and Cheng, Xueqi},
journal={arXiv preprint arXiv:1707.07270},
year={2017}
}
开发团队
Fan Yixing Core Dev | Wang Bo Core Dev |
Wang Zeyi Core Dev |
Pang Liang Core Dev |
Yang Liu Core Dev |
Wang Qinghua Documentation |
Wang Zizhen Dev |
Su Lixin Dev |
Yang Zhou Dev |
Tian Junfeng Dev |
贡献
请确保在创建拉取请求之前阅读Contributing Guide。如果您有与matchzoo相关的paper/project/compent/tool,请向this awesome list发送拉取请求!
感谢所有已经为Matchzoo捐款的人!
Jianpeng Hou,Lijuan Chen,Yukun Zheng,Niuguo Cheng,Dai Zhuyun,Aneesh Joshi,Zeno Gantner,Kai Huang,stanpcf,ChangQF,Mike Kellogg
项目组织者
许可证
版权(C)2015,宜兴范(Fasimon)