一个快速而轻量级的Python RDF解析器,它使用PyO3将绑定打包到Rust的Rio
lightrdf的Python项目详细描述
灯RDF
一个快速而轻量级的Python RDF解析器,它使用PyO3将绑定包装到Rust的Rio。在
特点
- 支持N-Triples、Turtle和RDF/XML
- 处理大型RDF文档
- 提供类似HDT的接口
安装
pip install lightrdf
使用
迭代所有三元组(解析器)
^{pr2}$迭代所有三元组(类似HDT)
importlightrdfdoc=lightrdf.RDFDocument("./go.owl")# ...or lightrdf.RDFDocument("./go.owl", base_iri="", parser=lightrdf.xml.PatternParser) for xml# `None` matches arbitrary termfortripleindoc.search_triples(None,None,None):print(triple)
三重模式(类似HDT)
importlightrdfdoc=lightrdf.RDFDocument("./go.owl")fortripleindoc.search_triples("http://purl.obolibrary.org/obo/GO_0005840",None,None):print(triple)# Output:# ('http://purl.obolibrary.org/obo/GO_0005840', 'http://www.w3.org/1999/02/22-rdf-syntax-ns#type', 'http://www.w3.org/2002/07/owl#Class')# ('http://purl.obolibrary.org/obo/GO_0005840', 'http://www.w3.org/2000/01/rdf-schema#subClassOf', 'http://purl.obolibrary.org/obo/GO_0043232')# ...# ('http://purl.obolibrary.org/obo/GO_0005840', 'http://www.geneontology.org/formats/oboInOwl#inSubset', 'http://purl.obolibrary.org/obo/go#goslim_yeast')# ('http://purl.obolibrary.org/obo/GO_0005840', 'http://www.w3.org/2000/01/rdf-schema#label', '"ribosome"^^<http://www.w3.org/2001/XMLSchema#string>')
提示:用Python打开文件(解析器)
importlightrdfparser=lightrdf.Parser()withopen("./go.owl","rb")asf:fortripleinparser.parse(f,format="owl",base_iri=None):print(triple)
importlightrdfparser=lightrdf.xml.Parser()withopen("./go.owl","rb")asf:fortripleinparser.parse(f,base_iri=None):print(triple)
提示:用Python打开文件(类似HDT)
importlightrdfwithopen("./go.owl","rb")asf:doc=lightrdf.RDFDocument(f,parser=lightrdf.xml.PatternParser)fortripleindoc.search_triples("http://purl.obolibrary.org/obo/GO_0005840",None,None):print(triple)
提示:从字符串解析
importioimportlightrdfdata="""<http://one.example/subject1> <http://one.example/predicate1> <http://one.example/object1> . # comments here# or on a line by themselves_:subject1 <http://an.example/predicate1> "object1" ._:subject2 <http://an.example/predicate2> "object2" ."""doc=lightrdf.RDFDocument(io.BytesIO(data.encode()),parser=lightrdf.turtle.PatternParser)fortripleindoc.search_triples("http://one.example/subject1",None,None):print(triple)
基准(在制品)
On MacBook Air (13-inch, 2017), 1.8 GHz Intel Core i5, 8 GB 1600 MHz DDR3
https://gist.github.com/ozekik/b2ae3be0fcaa59670d4dd4759cdffbed
$ wget -q http://purl.obolibrary.org/obo/go.owl $ gtime python3 count_triples_rdflib_graph.py ./go.owl # RDFLib 4.2.21436427235.29user 2.30system 3:59.56elapsed 99%CPU (0avgtext+0avgdata 1055816maxresident)k 0inputs+0outputs (283major+347896minor)pagefaults 0swaps $ gtime python3 count_triples_lightrdf_rdfdocument.py ./go.owl # LightRDF 0.1.114364277.90user 0.22system 0:08.27elapsed 98%CPU (0avgtext+0avgdata 163760maxresident)k 0inputs+0outputs (106major+41389minor)pagefaults 0swaps $ gtime python3 count_triples_lightrdf_parser.py ./go.owl # LightRDF 0.1.114364278.00user 0.24system 0:08.47elapsed 97%CPU (0avgtext+0avgdata 163748maxresident)k 0inputs+0outputs (106major+41388minor)pagefaults 0swaps
在
https://gist.github.com/ozekik/636a8fb521401070e02e010ce591fa92
$ wget -q http://downloads.dbpedia.org/2016-10/dbpedia_2016-10.nt $ gtime python3 count_triples_rdflib_ntparser.py dbpedia_2016-10.nt # RDFLib 4.2.2310501.63user 0.23system 0:02.47elapsed 75%CPU (0avgtext+0avgdata 26568maxresident)k 0inputs+0outputs (1140major+6118minor)pagefaults 0swaps $ gtime python3 count_triples_lightrdf_ntparser.py dbpedia_2016-10.nt # LightRDF 0.1.1310500.21user 0.04system 0:00.36elapsed 71%CPU (0avgtext+0avgdata 7628maxresident)k 0inputs+0outputs (534major+1925minor)pagefaults 0swaps
在
备选方案
托多
- [x] 推送到PyPI
- [x] 采用CI
- [x] 手柄底座IRI
- [x] 添加基本测试
- []支持NQuads和TriG
- []添加文档
- []为w3c/rdf-tests添加测试
- []出错时恢复
- [x] 允许打开fp
许可证
Rio和{a5}是在Apache-2.0许可下授权的。在
Copyright 2020 Kentaro Ozeki
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
- 项目
标签: