在网站中链式调用多个ajax请求以显示更多页面并在单页获取完整列表

0 投票
2 回答
38 浏览
提问于 2025-04-12 15:08

我想在浏览网页时,点击 Show More 按钮,能够加载完整的页面内容,预计大约会有8000个元素显示出来。这个 Show More 按钮会发送一个POST请求到 'https://icomarks.ai/icos/ajax_more'

我尝试了两种方法

import requests
from bs4 import BeautifulSoup

with requests.Session() as session:

    req = session.get('https://icomarks.ai/icos/')
    req = session.post('https://icomarks.ai/icos/ajax_more')
    req = session.post('https://icomarks.ai/icos/ajax_more')    # just for a couple
    soup = BeautifulSoup(req.content, "html.parser")

import requests
from bs4 import BeautifulSoup

s = requests.Session()
t=s.post('https://icomarks.ai/icos/')
r=s.get('https://icomarks.ai/icos/ajax_more')
r=s.get('https://icomarks.ai/icos/ajax_more')    # just for a couple
soup = BeautifulSoup(r.content, "html.parser")

但都没有成功。

我希望 soup.find_all('a', class_="icoListItem__title") 能找到需要加载的列表中的元素:

[<a class="icoListItem__title" href="/ico/5th-scape">5th Scape <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">128 Views</sup>
 </a>,
 <a class="icoListItem__title" href="/ico/pood-inu">Pood INU <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">330 Views</sup>
 </a>,
 <a class="icoListItem__title" href="/ico/etuktuk">eTukTuk <sup class="sup_is_premium">★ Promoted</sup> <sup class="sup_views">794 Views</sup>
...

2 个回答

0

这两种方法都行不通,因为每次你都在覆盖存储结果的变量:reqr。这样的话,最多只能保留最后一次的回复。你需要在每次请求后处理返回的结果,或者把它们存起来,以便后续处理。

简单来说,现在你做的事情就像是:

a = 1
a = 2
a = 3

当然,最后的结果会是3,之前的其他数字就没有了。

一个简单的代码结构示例可能是这样的:

all_them_links = []
found_links = []

while True:
    with requests.Session() as session:
        req = session.post('https://icomarks.ai/icos/ajax_more')
        soup = BeautifulSoup(req.content, "html.parser")
        found_links = soup.find_all('a', class_="icoListItem__title")
        if not found_links:
            break
        all_them_links.extend(found_links)
        found_links = []
0

改进一下来自 srn 的回答。其实这个响应是一个 JSON 对象,我修正了它的解析。这个对象里有一个叫 "content" 的属性,里面包含了实际的 HTML 内容,同时还打印了链接标签 <a> 元素的 href 属性。

import json
from pprint import pprint

import requests
from bs4 import BeautifulSoup


def main():
    all_them_links = []

    while True:
        with requests.Session() as session:
            req = session.post('https://icomarks.ai/icos/ajax_more')
            response = json.loads(req.content)
            pprint(response['offset'])
            soup = BeautifulSoup(response["content"], "html.parser")
            found_links = soup.find_all("a", class_="icoListItem__title")
            for a in found_links:
                all_them_links.append([a["href"], a.get_text()])
            else:
                break

    pprint(all_them_links)


main()

撰写回答