使用RDFlib从RDF图中获取可用的信息

Question

我正在学习使用RDF，并试图从dbpedia中提取一组事实作为我的学习练习。下面的代码示例有点效果，但对于像配偶这样的主题，它总是提取出自己。

问题：

get_name_from_uri()这个函数提取URI的最后一部分，并去掉下划线——肯定有更好的方法来获取一个人的名字。
关于配偶的结果不仅返回配偶的信息，还返回了数据主题本身——我不太明白这是怎么回事。
有些结果同时以URI格式和文本格式返回数据——

这是代码块的输出，显示了我得到的一些奇怪结果（看看属性中的混合输出，他居然和自己结婚了，还有Josephine这个名字也变得很奇怪？）

Accessing facts for Napoleon  held at  http://dbpedia.org/resource/Napoleon

There are  800  facts about Napoleon stored at the URI
http://dbpedia.org/resource/Napoleon

Here are a few:-
Ontology:deathdate

Napoleon died on 1821-05-05

Ontology:birthdate
Napoleon was born on 1769-08-15

Property:spouse retruns the person themslves twice !
Napoleon was married to  Marie Louise, Duchess of Parma
Napoleon was married to  Napoleon
Napoleon was married to  Jos%C3%A9phine de Beauharnais
Napoleon was married to  Napoleon

Property:title retruns text and uri's
Napoleon  Held the title:  "The Death of Napoleon"
Napoleon  Held the title: http://dbpedia.org/resource/Emperor_of_the_French
Napoleon  Held the title: http://dbpedia.org/resource/King_of_Italy
Napoleon  Held the title:  First Consul of France
Napoleon  Held the title:  Provisional Consul of France
Napoleon  Held the title:  http://dbpedia.org/resource/Napoleon
Napoleon  Held the title:  Emperor of the French
Napoleon  Held the title: http://dbpedia.org/resource/Co-Princes_of_Andorra
Napoleon  Held the title:  from the Memoirs of Bourrienne, 1831
Napoleon  Held the title:  Protector of the Confederation of the Rhine

Ontology birth place returns three records
Napoleon was born in  Ajaccio
Napoleon was born in  Corsica
Napoleon was born in  Early modern France

这是生成上述输出的Python代码，它需要使用rdflib库，目前还在不断完善中。

import rdflib
from rdflib import Graph, URIRef, RDF

######################################
#  A quick test of a python library reflib to get data from an rdf graph
# D Moore 15/3/2014
# needs rdflib > version 3.0

# CHANGE THE URI BELOW TO A DIFFERENT PERSON AND SEE WHAT HAPPENS
# COULD DO WITH A WEB FORM 
# NOTES:
#
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
#URI_ref = 'http://dbpedia.org/resource/Margaret_Thatcher'
#URI_ref = 'http://dbpedia.org/resource/Isaac_Newton'
#URI_ref = 'http://dbpedia.org/resource/Richard_Nixon'
URI_ref = 'http://dbpedia.org/resource/Napoleon'
#URI_ref = 'http://dbpedia.org/resource/apple'
##########################################################


def get_name_from_uri(dbpedia_uri):  
    # pulls the last part of a uri out and removes underscores
    # got to be an easier way but it works
    output_string = ""
    s = dbpedia_uri
    # chop the url into bits devided by the /
    tokens = s.split("/")
    # because the name of our person is in the last section itterate through each token 
    # and replace the underscore with a space
    for i in tokens :
        str = ''.join([i])
        output_string = str.replace('_',' ')
    # returns the name of the person without underscores 
    return(output_string)

def is_person(uri):
#####  SPARQL way to do this
    uri = URIRef(uri)
    person = URIRef('http://dbpedia.org/ontology/Person')
    g= Graph()
    g.parse(uri)
    resp = g.query(
        "ASK {?uri a ?person}",
        initBindings={'uri': uri, 'person': person}
    )
    print uri, "is a person?", resp.askAnswer
    return resp.askAnswer

URI_NAME = get_name_from_uri(URI_ref)
NAME_LABEL = ''

if is_person(URI_ref):
    print "Accessing facts for", URI_NAME, " held at ", URI_ref

    g = Graph()
    g.parse(URI_ref)
    print "Person Extract for", URI_NAME
    print "There are ",len(g)," facts about", URI_NAME, "stored at the URI ",URI_ref
    print "Here are a few:-"


    # Ok so lets get some facts for our person
    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthName")):
        print URI_NAME, "was born " + str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/deathDate")):
        print URI_NAME, "died on", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthDate")):
        print URI_NAME, "was born on", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/eyeColor")):
        print URI_NAME, "had eyes coloured", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/spouse")):
        print URI_NAME, "was married to ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/reigned")):
        print URI_NAME, "reigned ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/children")):
        print URI_NAME, "had a child called ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/profession")):
        print URI_NAME, "(PROPERTY profession) was trained as a  ", get_name_fro    m_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/child")):
        print URI_NAME, "PROPERTY child ", get_name_from_uri(str(stmt[1]))

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/deathplace")):
        print URI_NAME, "(PROPERTY death place) died at: ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/property/title")):
        print URI_NAME, "(PROPERTY title) Held the title: ", str(stmt[1])


    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/sex")):
        print URI_NAME, "was a ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/knownfor")):
        print URI_NAME, "was known for ", str(stmt[1])

    for stmt in g.subject_objects(URIRef("http://dbpedia.org/ontology/birthPlace")):
        print URI_NAME, "was born in ", get_name_from_uri(str(stmt[1]))

else:
    print "ERROR - "
    print "Resource", URI_ref, 'does not look to be a person or there is no record in dbpedia'

代码优化信息提取 uri rdf rdflib dbpedia 数据主题事实提取

使用RDFlib从RDF图中获取可用的信息

问题：

1 个回答

获取名称

意外的配偶

subject_objects(self, predicate=None)

objects(self, subject=None, predicate=None)

URI与文本（字面量）结果

撰写回答