如何进行并行的HTTP请求

3 投票

2 回答

1002 浏览

提问于 2025-04-18 10:00

我有一份包含100个ID的列表，我需要对每个ID进行查找。每次查找大约需要3秒钟。下面是我需要运行的顺序代码：

ids = [102225077, 102225085, 102225090, 102225097, 102225105, ...]
for id in ids:
    run_updates(id)

我想同时运行十个查找，可以使用gevent或者多进程。请问我该怎么做？我尝试过用gevent，但速度很慢：

def chunks(l, n):
    """ Yield successive n-sized chunks from l.
    """
    for i in xrange(0, len(l), n):
        yield l[i:i+n]

ids = [102225077, 102225085, 102225090, 102225097, 102225105, ...]

if __name__ == '__main__':
    for list_of_ids in list(chunks(ids, 10)):
    jobs = [gevent.spawn(run_updates(id)) for id in list_of_ids]
    gevent.joinall(jobs, timeout=200)

怎样才能把ID列表分成几部分，每次运行十个呢？我对使用多进程或gevent都不太熟悉，但都可以尝试。

如果按顺序执行，处理100个ID需要364秒。

使用多进程处理100个ID大约需要207秒，每次处理5个：

pool = Pool(processes=5)
pool.map(run_updates, list_of_apple_ids)

使用gevent的速度介于两者之间：

jobs = [gevent.spawn(run_updates, apple_id) for apple_id in list_of_apple_ids]

有没有办法比Pool.map获得更好的性能？我这台电脑性能不错，网络连接也很快，应该能更快完成这个任务...

性能优化 http请求多进程任务调度 gevent 并行请求网络查找 ID处理

2 个回答

from multiprocessing import Process
from random import Random.random

ids = [random() for _ in range(100)] # make some fake ids, whatever

def do_thing(arg): 
    print arg # Here's where you'd do lookup 

while ids: 
    curs, ids = ids[:10], ids[10:]
    procs = [Process(target=do_thing, args=(c,)) for c in curs]
    for proc in procs: 
        proc.run()

大概就是我会这么做吧。

回答于 2025-04-18 由 Python大师

分享举报

可以看看这个 grequests 库。你可以这样做：

import grequests


for list_of_ids in list(chunks(ids, 10)):
    urls = [''.join(('http://www.example.com/id?=', id)) for id in list_of_ids]
    requests = (grequests.get(url) for url in urls)
    responses = grequests.map(requests)

    for response in responses:
        print response.content

我知道这样可能会打乱你的模型，因为你的请求是放在一个叫 run_updates 的方法里的，但我觉得还是值得一试。

回答于 2025-04-18 由 Python大师

分享举报

如何进行并行的HTTP请求

2 个回答

撰写回答