如何用RDFLib解析.ttl文件？

3条回答

网友

1楼 · 编辑于 2024-05-15 21:09:52

您可以按照Snakes和Coffee的建议执行，只需使用yield语句将该函数（或其代码）包装在一个循环中。这将创建一个生成器，可以迭代调用它来动态创建下一行的指令。假设您要将这些内容写入csv，例如，使用Snakes的parse_to_dict：

import re
import csv

writer = csv.DictWriter(open(outfile, "wb"), fieldnames=["id", "name", "address", "phone"])
# or whatever

可以将生成器创建为函数或具有内联理解：

def dict_generator(lines): 
    for line in lines: 
        yield parse_to_dict(line)

--或--

dict_generator = (parse_to_dict(line) for line in lines)

这些都差不多。此时，您可以通过调用dict_generator.next()来获得一个dict解析行，并且您将神奇地一次获得一个-不涉及额外的RAM抖动。

如果你有16吉格的原始数据，你可以考虑做一个发电机，把线拉进去。它们真的很有用。

有关SO和一些文档中生成器的更多信息： What can you use Python generator functions for?http://wiki.python.org/moin/Generators

网友

2楼 · 编辑于 2024-05-15 21:09:52

Turtle是Notation 3语法的子集，因此rdflib应该能够使用format='n3'解析它。检查rdflib是否保留注释（示例中的注释（#...）中指定了id）。如果不是，并且输入格式如示例中所示简单，则可以手动解析它：

import re
from collections import namedtuple
from itertools import takewhile

Entry = namedtuple('Entry', 'id name address phone')

def get_entries(path):
    with open(path) as file:
        # an entry starts with `#@` line and ends with a blank line
        for line in file:
            if line.startswith('#@'):
                buf = [line]
                buf.extend(takewhile(str.strip, file)) # read until blank line
                yield Entry(*re.findall(r'<([^>]+)>', ''.join(buf)))

print("\n".join(map(str, get_entries('example.ttl'))))

输出：

Entry(id='id1', name='Alice', address='USA', phone='12345')
Entry(id='id1', name='Jane', address='France', phone='78900')

要将条目保存到数据库：

import sqlite3

with sqlite3.connect('example.db') as conn:
    conn.execute('''CREATE TABLE IF NOT EXISTS entries
             (id text, name text, address text, phone text)''')
    conn.executemany('INSERT INTO entries VALUES (?,?,?,?)',
                     get_entries('example.ttl'))

如果需要在Python中进行一些后处理，请按id分组：

import sqlite3
from itertools import groupby
from operator import itemgetter

with sqlite3.connect('example.db') as c:
    rows = c.execute('SELECT * FROM entries ORDER BY id LIMIT ?', (10,))
    for id, group in groupby(rows, key=itemgetter(0)):
        print("%s:\n\t%s" % (id, "\n\t".join(map(str, group))))

输出：

id1:
    ('id1', 'Alice', 'USA', '12345')
    ('id1', 'Jane', 'France', '78900')

网友

3楼 · 编辑于 2024-05-15 21:09:52

目前似乎没有这样的库来解析Turtle - Terse RDF Triple Language

正如您已经知道的语法，最好的方法是使用PyParsing首先创建一个语法，然后解析该文件。

我还建议根据您的需要调整以下EBNF implementation

相关问题更多 >

编程相关推荐

热门问题

热门文章