Django+芹菜。查询期间与MySQL服务器的连接中断

2024-06-16 11:08:00 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Django应用程序,可以运行一些芹菜链接任务。 第一个任务向一个外部应用程序请求它的DB转储,第二个任务下载它,第三个任务给我带来问题

  • 首先,它通过Popen打开垃圾堆的拉链,大约需要30分钟
  • 之后,它还通过Popen运行sed命令,删除一些我不使用的表。这个sed命令大约需要一个小时
  • 最后,我再次运行Popen来恢复转储
proc = Popen([
            "mysql", "-h{}".format(settings.MYSQL_HOST), "-u{}".format(settings.MYSQL_USER),
            "-p{}".format(settings.MYSQL_PASSWORD), snapshot_restore.database_name
        ], stdout=PIPE, stderr=PIPE, stdin=sed_dumpfile)

这个过程大约需要8小时

  • 一旦恢复完成,它将更新数据库中的django模型,该信号将触发下一个任务

snapshot_restore = SnapshotRestore.objects.get(id=snapshot_restore_id)
snapshot_restore.status = "processing"
snapshot_restore.save()

# Here I make the unzip, the sed and the restore

snapshot_restore.status = "finished"
snapshot_restore.database_status = "active"
snapshot_restore.save()

但当我有这个错误的时候

[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] File "/usr/local/lib/python3.6/site-packages/django/db/backends/mysql/base.py", line 266, in _set_autocommit
    self.connection.autocommit(autocommit)
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 256, in autocommit
    _mysql.connection.autocommit(self, on)
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] _mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] The above exception was the direct cause of the following exception:
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] Traceback (most recent call last):
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] File "/srv/src/core/tasks.py", line 243, in restore_snapshot_task
    snapshot_restore.save()

我已经尝试了一些方法来解决这个问题

首先,尝试在保存实体之前再次获取该实体:

    try:
        snapshot_restore.save()
    except Exception as e:
        logger.error('Error updating snapshot_restore status, will retry.')
        snapshot_restore = SnapshotRestore.objects.get(id=snapshot_restore_id)
        snapshot_restore.status = "finished"
        snapshot_restore.database_status = "active"
        snapshot_restore.save()

但还是有同样的错误

[2020-04-19 11:33:20,701: INFO/ForkPoolWorker-2] core.tasks.restore_snapshot_task[None]: restore complete in 28783.03s
[2020-04-19 11:33:20,701: ERROR/ForkPoolWorker-2] core.tasks.restore_snapshot_task[None]: Error updating snapshot_restore status, will retry.
[2020-04-19 11:33:20,720: WARNING/ForkPoolWorker-2] --- Logging error ---
[2020-04-19 11:33:20,722: WARNING/ForkPoolWorker-2] Traceback (most recent call last):
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] File "/usr/local/lib/python3.6/site-packages/django/db/backends/mysql/base.py", line 266, in _set_autocommit
    self.connection.autocommit(autocommit)
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 256, in autocommit
    _mysql.connection.autocommit(self, on)
[2020-04-19 11:33:20,723: WARNING/ForkPoolWorker-2] _mysql_exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')

递增MySql交互超时

我尝试增加MySql空闲连接时间(interactive_timeout),因为这是一个AWS RDS实例,我必须通过参数组来完成,如我在这里发现的:

Amazon RDS unable to execute SET GLOBAL command

我将其设置为12小时,然后终止一个空闲连接,但仍然得到相同的“断开连接”错误

创建了一个单独的线程,该线程一直在查询de DB

我写了这个函数

def keep_connection_alive(id, stop):
    logger.info("Starting keep_connection_alive iteration id: " + str(id))
    while True:
        time.sleep(600)
        try:
            SnapshotRestore.objects.get(id=randrange(10))
        except Exception as e:
            logger.error('Error keeping alive: ' + str(e))
        if stop():
            print('Stopping keep_connection_alive')
            break
    logger.info('keep_connection_alive out of while')

并在任务开始时这样称呼它

stop_thread = False
keep_alive_thread = threading.Thread(target=keep_connection_alive, args=(1, lambda: stop_thread))
keep_alive_thread.start()

我有很多

[2020-04-19 07:46:47,523: ERROR/ForkPoolWorker-2] ???[???]: Error keeping alive: SnapshotRestore matching query does not exist.

因为大多数时候,身份证都不存在。我运行了这段代码两次,第一次是在启动进程8小时后出现错误

[2020-04-18 10:28:35,893: ERROR/ForkPoolWorker-2] ???[???]: Error keeping alive: SnapshotRestore matching query does not exist.
[2020-04-18 10:38:36,016: ERROR/ForkPoolWorker-2] ???[???]: Error keeping alive: SnapshotRestore matching query does not exist.
[2020-04-18 10:52:56,766: ERROR/ForkPoolWorker-2] ???[???]: Error keeping alive: (2013, 'Lost connection to MySQL server during query')
[2020-04-18 11:02:56,867: ERROR/ForkPoolWorker-2] ???[???]: Error keeping alive: (2006, 'MySQL server has gone away')
[2020-04-18 11:12:56,955: ERROR/ForkPoolWorker-2] ???[???]: Error keeping alive: (2006, 'MySQL server has gone away')

第二次我得到它时,它正试图保存恢复

snapshot_restore.save()

从这里到哪里去

我只剩下一个想法了,那就是触发另一个芹菜任务,它只是改变了模型的状态,所以信号也会被触发,下一个任务就会开始。 但除了测试和祈祷它能起作用之外,我想了解为什么会发生这种情况,以及是否有可能的解决办法

提前谢谢


Tags: idstatusmysqlsnapshotrestorekeepingerrorconnection