Python中的异步HTTP调用

12 投票

4 回答

22850 浏览

数据工程师

提问于 2025-04-16 11:36

我需要在Python中实现一种回调功能，也就是说，我要多次向一个网络服务发送请求，每次请求的参数都不一样。我希望这些请求能够同时进行，而不是一个接一个地排队，所以我想让这个函数异步执行。

看起来asyncore可能是我想用的东西，但我看到的例子都显得有点复杂，所以我在想是否有其他更简单的方法可以选择。有没有什么模块或流程的建议？理想情况下，我希望以过程式的方式使用这些，而不是创建类，不过我可能无法避免使用类。

网络服务 http请求异步编程回调函数 asyncore 过程式编程

4 个回答

你知道 eventlet 吗？它可以让你写看起来像是同步的代码，但实际上是在网络上异步运行的。

下面是一个非常简单的爬虫示例：

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)

回答于 2025-04-16 由 Python大师

分享举报

从Python 3.2开始，你可以使用 concurrent.futures 来启动并行任务，也就是让多个任务同时进行。

这里有一个关于 ThreadPoolExecutor 的例子：

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

这个例子会创建多个线程来获取HTML内容，并在收到响应后立即处理这些内容。

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

上面的例子使用了线程。此外，还有一个类似的 ProcessPoolExecutor，它是使用进程池，而不是线程：

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

回答于 2025-04-16 由 Python大师

分享举报

Twisted框架正好适合这个需求。不过，如果你不想使用这个框架，你也可以试试pycurl，它是libcurl的一个封装，拥有自己的异步事件循环，并且支持回调功能。

回答于 2025-04-16 由 Python大师

分享举报

Python中的异步HTTP调用

4 个回答

撰写回答