芹菜:全局超时的异常处理

2024-04-25 21:29:03 发布

您现在位置:Python中文网/ 问答频道 /正文

芹菜超时将杀死作业,而不会将它们放回我的redis队列。有没有办法捕捉这些超时错误?你知道吗

我有一个网页刮板,提出了很多要求。如果我提交5MM请求,我注意到其中100000个请求可能会由于全局超时而被丢弃。你知道吗

我用celery worker -Ofair --concurrency=600 --without-gossip --time-limit=500 --pool=gevent -l critical将500秒的全局超时输入芹菜工作者。你知道吗

@task(soft_time_limit=16, )
def urlopen(url):
    print('-open: {0}'.format(url))
    try:
        content, status_code, parsed_data = get_session(url)

        # Control flow for what we are going to do with the responses based on status code
        if status_code == 200:
            if len(parsed_data) > 0:
                for item in parsed_data:
                    add_url_es(item[1], item[2], item[0], url, es)
                add_200_urls_to_redis(url)
            elif parsed_data is None:
                add_urls_to_redis(url)
            else:
                print(status_code, parsed_data, url)
                add_failed_to_redis(url)

        elif status_code == 403:
            print(' this page has been denied due to {0} error for url {1}'.format(status_code, url))
            add_urls_to_redis(url)
        elif status_code == 404:
            print(' this page does not seem to exist due to {0} error for url {1}'.format(status_code, url))
            add_404_urls_to_redis(url)
        else:
            print('something is really wrong, error code {0}, url = {1}'.format(status_code, url))
            add_urls_to_redis(url)

    except SoftTimeLimitExceeded:
        print('time limit exceeded')
        add_urls_to_redis(url)
    except Exception as e:
        print('unaccounted for error {}'.format(e))
        add_urls_to_redis(url)

我可以很容易地处理软超时,但我想知道如何抓住全局。你知道吗


Tags: toredisaddformaturlfordatastatus