python aiomysql 查询时间随并行任务增加而增加
我正在使用Python的aiomysql库在异步应用中查询MySQL数据库。同时,我还使用anyio来运行并行任务,通过create_task_group来实现。有时候,我需要从MySQL数据库中分块获取数据,为了加快查询速度,我创建了任务组并并行运行查询,但我遇到了一个问题:随着任务数量的增加,单个查询的时间反而变长了。下面是一个示例代码片段:
import os
import aiomysql
from time import time
from anyio import run, create_task_group
from dotenv import load_dotenv
diffs = [] # contains each single query time
tot_reqs = 0 # count number of total requests
async def measure(pool):
global diffs
global tot_reqs
async with pool.acquire() as conn:
start_time = time()
while True:
t1 = time()
cur = await conn.cursor()
sql = "select * from ttable limit 10000"
await cur.execute(query=sql)
await cur.close()
diffs.append(time() - t1)
tot_reqs += 1
if len(diffs) > 9: # print mean time of every 10 queries
print(
f"[{time()-start_time:.2f}s since started] Mean query time: {sum(diffs)/len(diffs)} (total requests performed: {tot_reqs})"
)
diffs = []
async def case():
db_env_prefix = "SOME_MYSQL"
pool = await aiomysql.create_pool(
host=os.getenv(db_env_prefix + "_HOST"),
port=int(os.getenv(db_env_prefix + "_PORT") or 3306),
user=os.getenv(db_env_prefix + "_USERNAME"),
password=os.getenv(db_env_prefix + "_PASSWORD"),
db=os.getenv(db_env_prefix + "_DB"),
maxsize=10,
autocommit=True,
pool_recycle=600,
)
async with create_task_group() as tg:
for _ in range(10):
tg.start_soon(measure, pool)
pool.close()
await pool.wait_closed()
if __name__ == "__main__":
load_dotenv()
run(case)
打印的输出是:
[0.83s since started] Mean query time: 0.5825932741165161 (total requests performed: 10)
[2.95s since started] Mean query time: 0.989139986038208 (total requests performed: 20)
[4.23s since started] Mean query time: 1.238171124458313 (total requests performed: 30)
[5.53s since started] Mean query time: 1.2967675924301147 (total requests performed: 40)
[6.83s since started] Mean query time: 1.2985524892807008 (total requests performed: 50)
[8.14s since started] Mean query time: 1.3030725479125977 (total requests performed: 60)
[9.44s since started] Mean query time: 1.3051365852355956 (total requests performed: 70)
[10.75s since started] Mean query time: 1.3047129154205321 (total requests performed: 80)
[12.05s since started] Mean query time: 1.3064133167266845 (total requests performed: 90)
[13.36s since started] Mean query time: 1.3030510425567627 (total requests performed: 100)
所以,当我并行运行10个任务时,大约用了13秒,完成了大约100个请求,平均每个查询的时间是1.3秒。然后我只运行了一个任务:
async with create_task_group() as tg:
for _ in range(1):
tg.start_soon(measure, pool)
我得到了
[1.24s since started] Mean query time: 0.12407510280609131 (total requests performed: 10)
[2.54s since started] Mean query time: 0.13026781082153321 (total requests performed: 20)
[3.85s since started] Mean query time: 0.13041894435882567 (total requests performed: 30)
[5.16s since started] Mean query time: 0.13082268238067626 (total requests performed: 40)
[6.47s since started] Mean query time: 0.13111093044281005 (total requests performed: 50)
[7.78s since started] Mean query time: 0.13118529319763184 (total requests performed: 60)
[9.09s since started] Mean query time: 0.1312186002731323 (total requests performed: 70)
[10.40s since started] Mean query time: 0.13101680278778077 (total requests performed: 80)
[11.72s since started] Mean query time: 0.13155796527862548 (total requests performed: 90)
[13.03s since started] Mean query time: 0.131210994720459 (total requests performed: 100)
同样的100个请求在大约13秒内完成(单个任务的查询速度快了大约10倍,平均查询时间是0.13秒)。这里没有任何CPU密集型的任务,只有IO请求。
所以,增加任务数量并没有实际改善查询时间:任务越多,单个查询反而越慢。我尝试了查询不同的表,修改MySQL的配置,比如增加innodb_thread_concurrency、innodb_read_io_threads等。实际上,这看起来并不是表或数据库配置的问题,因为当我在多个终端同时运行相同的代码时,单个查询的时间没有受到影响,表现得和我预期的tg.start_soon方法一样。多进程可以帮助解决这个问题,但它也有自己的缺点。而且,对于IO任务来说,这种行为很奇怪,异步请求不应该被阻塞。
更新: 根据评论的要求:
MariaDB [(none)]> SELECT 'SLEEPING MDB Ram use', COUNT(*),SUM(time),SUM(memory_used),SUM(max_memory_used) FROM information_schema.processlist WHERE command="Sleep";
+----------------------+----------+-----------+------------------+----------------------+
| SLEEPING MDB Ram use | COUNT(*) | SUM(time) | SUM(memory_used) | SUM(max_memory_used) |
+----------------------+----------+-----------+------------------+----------------------+
| SLEEPING MDB Ram use | 1 | 309 | 79632 | 162472 |
+----------------------+----------+-----------+------------------+----------------------+
1 个回答
0
以下是一些建议,可以帮助你优化my.cnf文件中的[mysqld]部分。
thread_pool_size=6 # from 8 - suggestion in doc is 80% of your 8 cores max
table_open_cache_instances=1 # from 8 until you have more than 1,000 tables
net_buffer_length=98304 # from 16384 to reduce packet in/out count
innodb_io_capacity=900 # from 200 to user move of your NVME IOPS
tmp_table_size=33554532 # from 16M to expand capacity
max_heap_table_size=33554532 # from 16M to reduce created_tmp_disk_tables
还有很多方法可以提升性能,欢迎查看我的个人资料。
希望你的查询能更快完成。当你通过一个连接一次性发送一组查询时,可能无法达到你想要的并行处理效果。如果有时间的话,实施后请分享新的完成时间。