Python中的异步HTTP调用
我需要在Python中实现一种回调功能,也就是说,我要多次向一个网络服务发送请求,每次请求的参数都不一样。我希望这些请求能够同时进行,而不是一个接一个地排队,所以我想让这个函数异步执行。
看起来asyncore可能是我想用的东西,但我看到的例子都显得有点复杂,所以我在想是否有其他更简单的方法可以选择。有没有什么模块或流程的建议?理想情况下,我希望以过程式的方式使用这些,而不是创建类,不过我可能无法避免使用类。
4 个回答
16
你知道 eventlet 吗?它可以让你写看起来像是同步的代码,但实际上是在网络上异步运行的。
下面是一个非常简单的爬虫示例:
urls = ["http://www.google.com/intl/en_ALL/images/logo.gif",
"https://wiki.secondlife.com/w/images/secondlife.jpg",
"http://us.i1.yimg.com/us.yimg.com/i/ww/beta/y3.gif"]
import eventlet
from eventlet.green import urllib2
def fetch(url):
return urllib2.urlopen(url).read()
pool = eventlet.GreenPool()
for body in pool.imap(fetch, urls):
print "got body", len(body)
18
从Python 3.2开始,你可以使用 concurrent.futures
来启动并行任务,也就是让多个任务同时进行。
这里有一个关于 ThreadPoolExecutor
的例子:
http://docs.python.org/dev/library/concurrent.futures.html#threadpoolexecutor-example
这个例子会创建多个线程来获取HTML内容,并在收到响应后立即处理这些内容。
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the url and contents
def load_url(url, timeout):
conn = urllib.request.urlopen(url, timeout=timeout)
return conn.readall()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
上面的例子使用了线程。此外,还有一个类似的 ProcessPoolExecutor
,它是使用进程池,而不是线程:
http://docs.python.org/dev/library/concurrent.futures.html#processpoolexecutor-example
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the url and contents
def load_url(url, timeout):
conn = urllib.request.urlopen(url, timeout=timeout)
return conn.readall()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))