如何在Python中使用Aiohttp解析带分页的网站？我不知道网站上实际上有多少页

-1 投票

0 回答

24 浏览

提问于 2025-04-12 07:31

我知道怎么用Requests库来做这个，只需要一个while True的循环，当我遇到空页面或者404错误时，就会跳出循环。不过用aiohttp的时候，我是用gather来处理的，当页面为空时，我就取消所有的任务，这样就会丢失那些还没完成的任务。

async def get_goods_from_pages(session, page):  
    url = f'https://somewebsite?page={page}'
    async with session.get(url, headers=headers) as r:
    soup = BS(await r.text(), 'lxml')

    all_goods = soup.find_all('div', class_='js_category-list-item')
    if all_goods:
        for el in all_goods:
            print(el)
    else:
        raise SomeError

# collect all tasks function
async def get_pages_info():
    tasks = []
    async with aiohttp.ClientSession() as session:
    for page in range(1, 150):
        task = asyncio.create_task(get_goods_from_pages(session, page))
        tasks.append(task)  
    try:    
        group = asyncio.gather(*tasks)
        await group
    except Exception:
        group.cancel()

我也试过用while True循环，并且用await来调用函数，但这样解析的速度非常慢。

异步编程网络爬虫协程 aiohttp 分页解析请求库

0 个回答

暂无回答

如何在Python中使用Aiohttp解析带分页的网站？我不知道网站上实际上有多少页

0 个回答

撰写回答