使用SQLAlchemy将记录更快地插入到表中

log_table = schema.Table('log_table', metadata, schema.Column('id', types.Integer, primary_key=True), schema.Column('time', types.DateTime), schema.Column('ip', types.String(length=15)) .... engine = create_engine(...) metadata.bind = engine connection = engine.connect() .... for line in file_to_parse: m = line_regex.match(line) if m: fields = m.groupdict() pythonified = pythoninfy_log(fields) #Turn them into ints, datatimes, etc if use_sql: ins = log_table.insert(values=pythonified) connection.execute(ins) parsed += 1

3条回答

网友

1楼 · 编辑于 2024-05-21 03:22:23

您应该尝试将一个事务放在多个插入周围，因为将数据库提交到磁盘确实需要很长时间。您将需要决定批处理级别，但粗略的第一个尝试是将一个事务包装在整个批中。在

网友

2楼 · 编辑于 2024-05-21 03:22:23

不知道表格引擎（MyISAM？InnoDB？），模式和索引，很难对您正在使用的两个数据库之间的细节进行评论。在

然而，当像这样使用MySQL时，您可能会发现将数据写到一个临时文本文件中，然后use the LOAD DATA INFILE syntax将其全部加载到数据库中要快得多。运行执行此操作所需的SQL看起来像you can call the execute method on your connection object。在

此外，如果您对逐行添加内容一成不变，并且每次都要重新创建表，则可以在程序中验证键约束，并仅在插入所有行之后才添加这些约束，这样可以节省数据库在每次插入时执行约束检查的时间。在

网友

3楼 · 编辑于 2024-05-21 03:22:23

为了实现批处理，我执行了以下操作：

inserts = []
insert_every = 1000
for line in file_to_parse:
    m = line_regex.match(line)
    if m:
        fields = m.groupdict()
        if use_sql: #This uses Globals, Ick :-/
            inserts.append(pythonified)
            if (parsed % insert_every) == 0:
                connection.execute(log_table.insert(), inserts)
                inserts = []
            parsed += 1
if use_sql:
    if len(inserts) > 0:
        connection.execute(log_table.insert(), inserts)

这不使用事务，但它以一种非常懒惰的方式允许我使用一个较小的示例将insert/parse阶段从大约13秒变为大约2秒。我将使用完整的示例了解mysql和sqlite之间的区别。在

我找到了这个here的基本信息。在

结果：
引擎：未分组插入以分钟为单位的时间：分组插入时间（以分钟为单位）
Sqlite：61：8
MySql:15个：2.5

我没有在mysql和sqlite之间刷新我的缓存，它们可能有源文本文件，但我不认为这是一个相对显著的区别。在

相关问题更多 >

编程相关推荐

热门问题

热门文章