使用asyncio进行多个调用并将结果添加到字典中

2024-06-09 02:48:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我很难在Python3的异步库中提前完成。我有一个zipcodes列表,我正在尝试对API进行异步调用,以获取每个zipcodes对应的城市和州。我可以用for循环按顺序成功地完成它,但是我想在zipcode列表很大的情况下使它更快。

这是我原创作品的一个例子

import urllib.request, json

zips = ['90210', '60647']

def get_cities(zipcodes):
    zip_cities = dict()
    for idx, zipcode in enumerate(zipcodes):
        url = 'http://maps.googleapis.com/maps/api/geocode/json?address='+zipcode+'&sensor=true'
        response = urllib.request.urlopen(url)
        string = response.read().decode('utf-8')
        data = json.loads(string)
        city = data['results'][0]['address_components'][1]['long_name']
        state = data['results'][0]['address_components'][3]['long_name']
        zip_cities.update({idx: [zipcode, city, state]})
    return zip_cities

results = get_cities(zips)
print(results)
# returns {0: ['90210', 'Beverly Hills', 'California'],
#          1: ['60647', 'Chicago', 'Illinois']}

这是我试图使它异步的可怕的非功能性尝试

import asyncio
import urllib.request, json

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcodes):
    url = 'http://maps.googleapis.com/maps/api/geocode/json?address='+zipcode+'&sensor=true'
    response = urllib.request.urlopen(url)
    string = response.read().decode('utf-8')
    data = json.loads(string)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

loop = asyncio.get_event_loop()
loop.run_until_complete([get_cities(zip) for zip in zips])
loop.close()
print(zip_cities) # doesnt work

任何帮助都非常感谢。我在网上看到的所有教程似乎都让我有点不知所措。

注意:我看到一些例子使用aiohttp。如果可能的话,我希望继续使用原生Python 3库。


Tags: jsonurldatagetaddressresponserequesturllib
2条回答

如果使用urllib来执行HTTP请求,您将无法获得任何并发,因为它是一个同步库。在coroutine中包装调用到urllib的函数不会改变这一点。必须使用集成到asyncio中的异步HTTP客户端,例如aiohttp

import asyncio
import json
import aiohttp

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcode,idx):
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key=abcdfg&address='+zipcode+'&sensor=true'
    response = yield from aiohttp.request('get', url)
    string = (yield from response.read()).decode('utf-8')
    data = json.loads(string)
    print(data)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

if __name__ == "__main__":        
    loop = asyncio.get_event_loop()
    tasks = [asyncio.async(get_cities(z, i)) for i, z in enumerate(zips)]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
    print(zip_cities)

我知道您更喜欢只使用stdlib,但是asyncio库不包含HTTP客户端,因此您必须基本上重新实现aiohttp的部分才能重新创建它提供的功能。我想另一种选择是在后台线程中进行urllib调用,这样它们就不会阻塞事件循环,但是当aiohttp可用时,这样做有点愚蠢(而且有点挫败了首先使用asyncio的目的):

import asyncio
import json
import urllib.request
from concurrent.futures import ThreadPoolExecutor

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcode,idx):
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key=abcdfg&address='+zipcode+'&sensor=true'
    response = yield from loop.run_in_executor(executor, urllib.request.urlopen, url)
    string = response.read().decode('utf-8')
    data = json.loads(string)
    print(data)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

if __name__ == "__main__":
    executor = ThreadPoolExecutor(10)
    loop = asyncio.get_event_loop()
    tasks = [asyncio.async(get_cities(z, i)) for i, z in enumerate(zips)]
    loop.run_until_complete(asyncio.wait(tasks))
    loop.close()
    print(zip_cities)

虽然在异步方面做得不多,但是asyncio.get_event_loop()应该是您所需要的,显然您还必须更改函数作为参数的内容,并根据docs使用asyncio.wait(tasks)

zips = ['90210', '60647']
zip_cities = dict()

@asyncio.coroutine
def get_cities(zipcode):
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key=abcdefg&address='+zipcode+'&sensor=true'
    fut = loop.run_in_executor(None,urllib.request.urlopen, url)
    response = yield  from fut
    string = response.read().decode('utf-8')
    data = json.loads(string)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    zip_cities.update({idx: [zipcode, city, state]})

loop = asyncio.get_event_loop()
tasks = [asyncio.async(get_cities(z, i)) for i, z in enumerate(zips)]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
print(zip_cities) # doesnt work
{0: ['90210', 'Beverly Hills', 'California'], 1: ['60647', 'Chicago', 'Illinois']}

我没有>;=3.4.4,所以我必须使用asyncio.async,而不是asyncio.ensure_future

或者更改逻辑并从task创建dict.result from the tasks:

@asyncio.coroutine
def get_cities(zipcode):
    url = 'https://maps.googleapis.com/maps/api/geocode/json?key=abcdefg&address='+zipcode+'&sensor=true'
    fut = loop.run_in_executor(None,urllib.request.urlopen, url)
    response = yield  from fut
    string = response.read().decode('utf-8')
    data = json.loads(string)
    city = data['results'][0]['address_components'][1]['long_name']
    state = data['results'][0]['address_components'][3]['long_name']
    return [zipcode, city, state]

loop = asyncio.get_event_loop()
tasks = [asyncio.async(get_cities(z)) for z in zips]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
zip_cities = {i:tsk.result() for i,tsk in enumerate(tasks)}
print(zip_cities)
{0: ['90210', 'Beverly Hills', 'California'], 1: ['60647', 'Chicago', 'Illinois']}

如果您查看的是外部模块,那么还有一个port of requests可用于asyncio。

相关问题 更多 >