Python sematch包_程序模块 - PyPI

知识图的语义相似框架

sematch的Python项目详细描述

！[logo]（docs/sources/img/logo.png）

————

利用语义匹配可以方便地计算概念、词和实体的语义相似度得分。Sematch主要关注基于特定知识的语义相似性度量，这些度量依赖于分类法中的结构知识（例如深度、路径长度、最不常见的子用户）和统计信息内容（语料库和图表ic）。基于知识的方法不同于基于对等语料库的方法，它们依赖于共现（如点态互信息）或分布相似性（潜在语义分析、word2vec、手套等）。基于知识的方法通常用于结构知识库，而基于语料库的方法通常用于文本语料库。

例如，首先根据wordnet概念的相似度得分计算单词相似度，然后通过合成单词相似度得分计算句子相似度。最后，可以通过识别重要的句子来计算文档相似度，例如textrank.

！[logo]（docs/sources/img/sematch motivation.jpg）

[kg]（docs/sources/img/kg.png）

在kgs中，概念通常表示本体类，而实体则表示本体实例。此外，这些概念通常被构造成层次分类法，例如dbpedia本体类，因此，在kg中量化概念相似度依赖于相似的语义信息（例如路径长度、深度、最不常见的subsumer，信息内容）和语义相似性度量（例如path、wu&palmer、li、resnik、lin、jiang&conrad和wpath）。因此，sematch提供了一个集成框架来开发和评估概念、单词、实体及其应用程序的语义相似性度量。

——————

使用pip安装它们的示例如下所示。

````
pip install numpy scipy
````

在成功安装**numpy**和**scipy**之后，您可以使用以下命令安装sematch。

`````
pip install sematch
python-m sematch.download
````

我们建议您更新pip和setuptools。

```
git clone https://github.com/gsi upm/sematch.git
cd sematch
python setup.py install
````

我们还提供了一个[sematch demo服务器]（https://github.com/gsi upm/sematch demo）。您可以使用它来试验主要功能，也可以将其作为使用Sematch开发应用程序的示例。请查看我们的[文档]（http://gsi-upm.github.io/sematch/）以了解更多详细信息。

\computing word similarity

sematch的核心模块是测量表示为概念分类法的概念之间的语义相似性。词相似度是基于wordnet概念的最大语义相似度来计算的。您可以使用sematch根据wordnet使用各种语义相似性度量来计算多语言单词的相似性。

``python
sematch.semantic.similarity导入wordnetsimilarity
wns=wordnetsimilarity（）

'li'）0.449327301063
使用lin方法计算西班牙语单词相似度
wns.monlo单词相似度（'perro'，'gato'，'spa'，'lin'）0.876800984373
使用wu&palmer方法计算汉语单词相似度
wns.monlo单词相似度（'29399;，'29483;，'cmn'，，'wup'）0.857142857143
使用resnik方法计算西班牙语和英语单词相似度
wns.crossl_word_similarity（'perro'，'cat'，'spa'，'eng'，'res'）7.91166650904
使用jiang&；计算西班牙语和汉语单词相似度；conrad方法使用wpath方法计算中英文单词相似度"wpath"）0.593666388463
``

``python
``symatch.semantic.similarity import yagotypesimilarity
sim=yagotypesimilarity（）

信息内容
sim.yagou相似度（'http://dbpedia.org/class/yago/dancer109989502'，'http://dbpedia.org/class/yago/actor109765278'，'wpath'）0.642
sim.yagou相似度（'http://dbpedia.org/class/yago/dancer109989502'，'http://dbpedia.org/class/yago/singer11059806'，，"wpath"）0.544
基于基于图的ic测量yago概念相似性
sim.yago相似性（'http://dbpedia.org/class/yago/dancer109989502'，'http://dbpedia.org/class/yago/actor109765278'，'wpath-graph'）0.423
sim.yago相似性（'http://dbpedia.org/class/yago/109989502'，，'http://dbpedia.org/class/yago/singer11059806'，'wpath-graph'）0.328
```

``python
分类法
来自sematch.semantic.similarity导入conceptsimilarity
concept=conceptsimilarity（分类法（dbpediadatatransform（）），'models/dbpedia-type-ic.txt'）
concept.name2concept（'actor'）
concept.similarity（'http://dbpedia.org/ontology/actor'，'http://dbpedia.org/ontology/film'，"路径"）
concept.similarity（'http://dbpedia.org/ontology/actor'，'http://dbpedia.org/ontology/film'，'wup'）
concept.similarity（'http://dbpedia.org/ontology/actor'，'http://dbpedia.org/ontology/film'，'li'）
concept.similarity（'http://dbpedia.org/ontology/actor'，'http://dbpedia.org/ontology/film'，'res'）
concept.similarity（'http://dbpedia.org/ontology/film'，'lin'）
concept.similarity（'http://dbpedia.org/ontology/actor'，'http://dbpedia.org/ontology/film'，"jcn"）
concept.similarity（'http://dbpedia.org/ontology/actor'，'http://dbpedia.org/ontology/film'，"wpath"）
```

``python
``symantic.similarity import entitysimilarity
sim.similarity（'http://dbpedia.org/resource/mardrid'，'http://dbpedia.org/resource/barcelona'）0.409923677282
sim.similarity（'http://dbpedia.org/resource/apple撸inc.'，'http://dbpedia.org/resource/steve撸jobs'）0.09045454545454545
sim.relatedness（'http://dbpedia.org/resource/mardrid'，'http://dbpedia.org/resource/barcelona'）0.457984139871
sim.relatedness（'http://dbpedia.org/resource/apple撸inc.'，'http://dbpedia.org/resource/steve_jobs'）0.465991132787
````

``python
=wordsimevaluation（）
evaluation.dataset_names（）
wns=wordnetsimilarity（）
定义相似性度量
wpath=lambda x，y:wns.word_similarity_wpath（x，y，0.8）
evaluatee simlex数据集的相似性度量
求值。求值度量（'wpath'，wpath，'noun_simlex'）
执行performa steiger的z显著性检验
求值。统计检验（'wpath'，'path'，'noun_simlex'）
定义西班牙语单词的相似性度量
wpath_e s=lambda x，y:wns.monol_u word戋u similarity（x，y，'spa，'path'）
定义英语西班牙语的跨语言相似度指标
wpath戋u en戋es=lambda x，y:wns.crossl戋u word戋u similarity（x，y，'eng，'spa'，"wpath"）
在多语言单词相似性数据集中计算度量值
evaluation.evaluate度量值（"wpath"，wpath，'rg65'u西班牙语]）
evaluation.evaluate度量值（"wpath"，wpath，"rg65-en-es"）
```

它依赖于对词对的人工判断，而词对在实际应用中可能没有相同的性能。因此，除了词语相似度评估外，语义匹配评估框架还包括一个简单的方面类别分类。该任务将诸如面食、面条、牛排、茶等名词概念分类为它们的本体父概念食物、饮料。

``python
来自sematch。evaluation import aspectevaluation
来自sematch。application import simclassifier，从sematch.semantic.similarity导入wordnetsimilarity

y=evaluation.load廑dataset（）
定义单词相似度函数
wns=wordnetsimilarity（）
word廑sim=lambda x，y:wns.word廑similarity（x，y）
使用无监督分类模型对度量进行训练和评估
simclassifier=simclassifier.train（zip（x，y），word-sim）
评估。评估（x，y，simclassifier）

宏平均值：（0.65319812882333839，0.710124504998579，0.66317566364913016，无）
微观平均值：（0.79210167952791644，0.79210167952791644，0.79210167952791644，无）
加权平均值：（0.8084264506024054，0.79210167952791644，0.79639496616636352，无）
准确度：0.792101679528
精确召回F1得分支持

服务0.50 0.43 0.46 519
餐厅0.81 0.66 0.73 228
食物0.95 0.87 0.91 2256
位置0.260.67 0.37 54
环境0.60 0.70 0.65 597
饮料0.81 0.93 0.87 752

平均/总计0.81 0.79 0.80 4406
`````

Sematch可使用不同的语言下载具有特定类型的实体列表。sematch将生成sparql查询并在[dbpedia sparql端点]（http://dbpedia.org/sparql）中执行这些查询。

``python
from sematch.application import matcher
matcher=matcher（）
dbpedia
matcher.match_type（'scientist'）
matcher.match_type（'cient ifico'，'spa'）
matcher.match_type（'31185;学家'，'cmn'）
matcher.match_entity_type（'movies with tom cruise'）
`````

自动生成sparql查询的示例。

``sql
select distinct？S？标签，？抽象在哪里{
{
？s<；http://www.w3.org/1999/02/22 rdf syntax ns 35; type>；<；http://dbpedia.org/class/yago/nuclearphysicst110364643>；。}
并集{
？s<；http://www.w3.org/1999/02/22 rdf syntax ns 35; type>；<；http://dbpedia.org/class/yago/econometrician110043491>；。}
并集{
？s<；http://www.w3.org/1999/02/22 rdf syntax ns 35; type>；<；http://dbpedia.org/class/yago/sociallogist110620758>；。}
并集{
？s<；http://www.w3.org/1999/02/22 rdf syntax ns 35; type>；<；http://dbpedia.org/class/yago/archeologist109804806>；。}
并集{
？s<；http://www.w3.org/1999/02/22 rdf syntax ns 35; type>；<；http://dbpedia.org/class/yago/neurlinguist110354053>；。} BR/>？s<；http://www.w3.org/1999/02/22 rdf syntax ns"type">；<；http://www.w3.org/2002/07/owl"thing">；。< BR>？S<；http://www.w3.org/2000/01/rdf schema label>；？标签。
过滤器（lang（？）标签）="en"）。< BR>？http://dbpedia.org/ontology/abstract>；？摘要。
过滤器（lang（？）摘要）="en"）。
}限制5000
```

还可以使用sematch提取实体的特征，并使用基于图的排序算法应用语义相似性分析。给定一个对象列表（概念、单词、实体），Sematch计算它们的成对语义相似度，并生成相似度图，其中节点表示对象，边表示相似度得分。使用相似图从实体描述中提取重要单词的示例。

``python
从sematch.semantic.graph import simgraph
从sematch.semantic.similarity import wordnetsimilarity
从sematch.nlp import extraction，word_process
来自sematch.semantic.sparql import entityfeatures
来自集合导入计数器
tom=entityfeatures（）.features（'http://dbpedia.org/resource/tom_cruise'）
words=extraction（）.extract_名词（tom['abstract']）
words=word_process（words）
wns=wordnetsimilarity（）
word_graph=simgraph（words，wns.word_similarity）
word_scores=word_graph.page_rank（）
words，scores=zip（*计数器（word_scores）。最常见（10））
打印单词
（u'picture'，u'action'，u'number'，u'film u'post'，u'sport'，
u'program'，u'men，你的"表现"，你的"运动"）
```

————

《出版物》

——朱刚高和卡洛斯·A·伊格莱西亚斯。[计算知识图中概念的语义相似性]（http://ieeexplore.ieee.org/document/7572993/）ieee知识与数据工程学报29.1（2017）：72-85.

-oscar araque，ganggao zhu，Manuel Garcia Amado和Carlos A.Iglesias[挖掘自以为是的网络：基于方面的情感分析的方面上下文分类和检测]（http://sentic.net/sentire2016araque.pdf），ICDM Sentire，2016。

-Ganggao Zhu和Carlos Angel Iglesias。"Sematch：从知识图中搜索语义实体。"sumpre hswi@eswc。2015年。

————

该项目主要由朱刚高负责维护。您可以通过GZHU[在]dit.upm.es

----

\

信号的名称基于西班牙语"se"和英语"match"。它也是语义匹配的缩写，因为语义相似性度量有助于确定概念、词、实体的语义距离，而不是完全匹配。

Sematch的徽标是基于中文[阴阳]（http://en.wikipedia.org/wiki/yin_and_yang），这是在[易经]（http://en.wikipedia.org/wiki/i_Ching）中编写的。不知何故，它与计算机科学中的0和1相关。

！[GSI徽标]（http://vps161.cesvima.upm.es/images/stories/logos/gsi.png）

欢迎加入QQ群-->： 979659372

sematch 1.0.4

sematch的Python项目详细描述

推荐PyPI第三方库

cosmologger

invenio-files-multisum-storage

mapargs

odoo13-addon-mail-server-relay-disallowed

pip-versions

opentelemetry-auto-instrumentation

zorroclient

vantage6

faker-scifi

jadukor

fawkes

topsis-101703161-A-deven

jutge-relayer

dmxnet

yoda-powers

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

sematch 1.0.4

sematch的Python项目详细描述

推荐PyPI第三方库

cosmologger

invenio-files-multisum-storage

mapargs

odoo13-addon-mail-server-relay-disallowed

pip-versions

opentelemetry-auto-instrumentation

zorroclient

vantage6

faker-scifi

jadukor

fawkes

topsis-101703161-A-deven

jutge-relayer

dmxnet

yoda-powers

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签