与多个Python脚本共享dict

2024-04-25 07:50:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望从同时运行的多个Python脚本访问一个惟一的dict(key/value)数据库。在

如果script1.py更新d[2839],那么script2.py在几秒钟后查询{}时应该会看到修改后的值。在

对此,Python的解决方案是什么?

注意:我在Windows上,dict应该最多有1M个项目(键和值都是整数)。在


Tags: keypy脚本数据库jsonsqlite进程value
3条回答

在redis出现之前,有Memcached(在windows上运行)。 这是一个教程。https://realpython.com/blog/python/python-memcache-efficient-caching/

除了SQLite之外,大多数嵌入式数据存储都没有针对并发访问进行优化,我对SQLite的并发性能也很好奇,所以我做了一个基准测试:

import time
import sqlite3
import os
import random
import sys
import multiprocessing


class Store():

    def __init__(self, filename='kv.db'):
        self.conn = sqlite3.connect(filename, timeout=60)
        self.conn.execute('pragma journal_mode=wal')
        self.conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
        self.conn.commit()

    def get(self, key):
        item = self.conn.execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        self.conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        self.conn.commit()


def worker(n):
    d = [random.randint(0, 1<<31) for _ in range(n)]
    s = Store()
    for i in d:
        s.set(i, i)
    random.shuffle(d)
    for i in d:
        s.get(i)


def test(c):
    n = 5000
    start = time.time()
    ps = []
    for _ in range(c):
        p = multiprocessing.Process(target=worker, args=(n,))
        p.start()
        ps.append(p)
    while any(p.is_alive() for p in ps):
        time.sleep(0.01)
    cost = time.time() - start
    print(f'{c:<10d}\t{cost:<7.2f}\t{n/cost:<20.2f}\t{n*c/cost:<14.2f}')


def main():
    print(f'concurrency\ttime(s)\tpre process TPS(r/s)\ttotal TPS(r/s)')
    for c in range(1, 9):
        test(c)


if __name__ == '__main__':
    main()

在我的4核macOS机箱上的结果,SSD卷:

^{pr2}$

在8核windows server 2012云服务器上的结果,SSD卷:

^{3}$

结果表明,不管并发性如何,总体吞吐量都是一致的,而且SQLite在windows上比macOS慢,希望这能有所帮助。在


由于SQLite write lock是针对数据库的,为了获得更多的TPS,可以将数据分区到多个数据库文件:

class MultiDBStore():

    def __init__(self, buckets=5):
        self.buckets = buckets
        self.conns = []
        for n in range(buckets):
            conn = sqlite3.connect(f'kv_{n}.db', timeout=60)
            conn.execute('pragma journal_mode=wal')
            conn.execute('create table if not exists "kv" (key integer primary key, value integer) without rowid')
            conn.commit()
            self.conns.append(conn)

    def _get_conn(self, key):
        assert isinstance(key, int)
        return self.conns[key % self.buckets]

    def get(self, key):
        item = self._get_conn(key).execute('select value from "kv" where key=?', (key,))
        if item:
            return next(item)[0]

    def set(self, key, value):
        conn = self._get_conn(key)
        conn.execute('replace into "kv" (key, value) values (?,?)', (key, value))
        conn.commit()

在我的mac上有20个分区的结果:

concurrency time(s) pre process TPS(r/s)    total TPS(r/s)
1           2.07    4837.17                 4837.17
2           2.51    3980.58                 7961.17
3           3.28    3047.68                 9143.03
4           4.02    2486.76                 9947.04
5           4.44    2249.94                 11249.71
6           4.76    2101.26                 12607.58
7           5.25    1903.69                 13325.82
8           5.71    1752.46                 14019.70

总TPS高于单个数据库文件。在

我考虑两个选项,都是嵌入式数据库

SQlite

正如回答here和{a2}应该没问题

伯克利db

link

Berkeley DB (BDB) is a software library intended to provide a high-performance embedded database for key/value data

它完全是为你设计的

BDB can support thousands of simultaneous threads of control or concurrent processes manipulating databases as large as 256 terabytes,3 on a wide variety of operating systems including most Unix-like and Windows systems, and real-time operating systems.

它很强大,已经存在了很多年甚至几十年了

启动redis/memcached/任何其他需要sysops参与的基于socket的完整服务器,对于任务来说,在同一个机器上的两个脚本之间交换数据是一个开销

相关问题 更多 >