Python 3 concurrent.futures：如何将失败的 Future 重新添加到 ThreadPoolExecutor？

4 投票

1 回答

6538 浏览

提问于 2025-04-18 17:57

我有一堆网址想要通过 concurrent.futures 的 ThreadPoolExecutor 来下载，但有些网址可能会超时，我想在第一次尝试后再重新下载这些超时的网址。我不知道该怎么做，下面是我尝试的代码，但它一直在打印 'time_out_again'，没完没了：

import concurrent.futures

def player_url(url):
    # here. if timeout, return 1. otherwise do I/O and return 0.
    ...

urls = [...]
time_out_futures = [] #list to accumulate timeout urls
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    future_to_url = (executor.submit(player_url, url) for url in urls)
    for future in concurrent.futures.as_completed(future_to_url):
        if future.result() == 1:
            time_out_futures.append(future)

# here is what I try to deal with all the timeout urls       
while time_out_futures:
    future = time_out_futures.pop()
    if future.result() == 1:
        print('time_out_again')
        time_out_futures.insert(0,future)   # add back to the list

那么，有什么办法解决这个问题吗？

multithreading threadpoolexecutor concurrent.futures url_download timeout_handling future_retry

1 个回答

Future对象只能使用一次。这个Future本身并不知道它是为哪个函数返回结果的——创建Future的工作是由ThreadPoolExecutor对象来完成的，它负责返回Future并在后台运行这个函数。

def submit(self, fn, *args, **kwargs):
    with self._shutdown_lock:
        if self._shutdown:
            raise RuntimeError('cannot schedule new futures after shutdown')

        f = _base.Future()
        w = _WorkItem(f, fn, args, kwargs)

        self._work_queue.put(w)
        self._adjust_thread_count()
        return f

class _WorkItem(object):
    def __init__(self, future, fn, args, kwargs):
        self.future = future
        self.fn = fn
        self.args = args
        self.kwargs = kwargs

    def run(self):
        if not self.future.set_running_or_notify_cancel():
            return

        try:
            result = self.fn(*self.args, **self.kwargs)  # sefl.fn is play_url in your case
        except BaseException as e:
            self.future.set_exception(e)
        else:
            self.future.set_result(result)  # The result is set on the Future

如你所见，当函数执行完毕后，结果会被设置到Future对象上。因为Future对象实际上并不知道提供结果的那个函数，所以你无法通过Future对象重新运行这个函数。你能做的只是当超时发生时返回url和1，然后再把这个url重新提交给ThreadPoolExecutor。

def player_url(url):
    # here. if timeout, return 1. otherwise do I/O and return 0.
    ...
    if timeout:
        return (1, url)
    else:
        return (0, url)

urls = [...]
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
    while urls:
        future_to_url = executor.map(player_url, urls)
        urls = []  # Clear urls list, we'll re-add any timed out operations.
        for future in future_to_url:
            if future.result()[0] == 1:
                urls.append(future.result()[1]) # stick url into list

回答于 2025-04-18 由 Python大师

分享举报

Python 3 concurrent.futures：如何将失败的 Future 重新添加到 ThreadPoolExecutor？

1 个回答

撰写回答