一个利用中文word net实现词义消歧的软件包
CwnSenseTagger的Python项目详细描述
基于中文词网的词义消歧
汉语是一门复杂的语言,汉语词义消歧一直是一个难题。一个词在不同的场合可以有几十个甚至几百个意思。人工标注词义是劳动密集型和低效的。在
在本计画中,我们的目标是藉由最新的伯特模型来解决这个问题。它给我们带来了巨大的性能提升,在中文词义消歧问题上可以获得大约82%的准确率。在
预请求
- 输入应首先标记化。POS标记是首选,但不是必需的。在
- 假设我们有m个句子,每个句子有$n_m$个单词。
- 在
列出句子[[列出单词[[target,pos,sense\u id,sense]*$n_m$]*m]
在 - 在
下面是一个有两个句子的例子,输入数据的格式如下
在[[["他","Nh","",""],["由","P","",""],["昏沈","VH","",""],["的","DE","",""],["睡夢","Na","",""],["中","Ng","",""],["醒來","VH","",""],[",","COMMACATEGORY","",""]], [["臉","Na","",""],["上","Ncd","",""],["濕涼","VH","",""],["的","DE","",""],["騷動","Nv","",""],["是","SHI","",""],["淚","Na","",""],["。","PERIODCATEGORY","",""]]]
- 在
如何获得理智
- 在
在项目根目录(与设置.py)在
^{pr2}$ 在 - 在
示例可以在示例文件夹下找到
在
确认
我们感谢陈宝文(b05902117@ntu.edu.tw)和Yu Yu Wu(b06902104@ntu.edu.tw)在模型开发方面的贡献。在
- 项目
标签: