Python中的异步HTTP调用

12 投票
4 回答
22850 浏览
提问于 2025-04-16 11:36

我需要在Python中实现一种回调功能,也就是说,我要多次向一个网络服务发送请求,每次请求的参数都不一样。我希望这些请求能够同时进行,而不是一个接一个地排队,所以我想让这个函数异步执行。

看起来asyncore可能是我想用的东西,但我看到的例子都显得有点复杂,所以我在想是否有其他更简单的方法可以选择。有没有什么模块或流程的建议?理想情况下,我希望以过程式的方式使用这些,而不是创建类,不过我可能无法避免使用类。

4 个回答

16

你知道 eventlet 吗?它可以让你写看起来像是同步的代码,但实际上是在网络上异步运行的。

下面是一个非常简单的爬虫示例:

urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
     "https://wiki.secondlife.com/w/images/secondlife.jpg",
     "http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]

import eventlet
from eventlet.green import urllib2

def fetch(url):

  return urllib2.urlopen(url).read()

pool = eventlet.GreenPool()

for body in pool.imap(fetch, urls):
  print "got body", len(body)
18

从Python 3.2开始,你可以使用 concurrent.futures 来启动并行任务,也就是让多个任务同时进行。

这里有一个关于 ThreadPoolExecutor 的例子:

http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example

这个例子会创建多个线程来获取HTML内容,并在收到响应后立即处理这些内容。

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

上面的例子使用了线程。此外,还有一个类似的 ProcessPoolExecutor,它是使用进程池,而不是线程:

http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the url and contents
def load_url(url, timeout):
    conn = urllib.request.urlopen(url, timeout=timeout)
    return conn.readall()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))
8

Twisted框架正好适合这个需求。不过,如果你不想使用这个框架,你也可以试试pycurl,它是libcurl的一个封装,拥有自己的异步事件循环,并且支持回调功能。

撰写回答