使用Python BeautifulSoup查找页数

import requests import re from bs4 import BeautifulSoup r = requests.get('http://store.steampowered.com/tags/en-us/RPG/') c = r.content soup = BeautifulSoup(c, 'html.parser') total_pages = soup.find_all("span",{"class":"paged_items_paging_pagelink"})[-1].text

2条回答

网友

1楼 · 编辑于 2024-05-16 07:53:35

如果检查页面源，则所需内容不可用。这意味着它是通过Javascript动态生成的。在

页码位于<span id="NewReleases_links">标记内，但在页面源代码中，HTML仅显示以下内容：

<span id="NewReleases_links"></span>

最简单的处理方法是使用Selenium。在

但是，如果查看页面源代码，文本Showing 1-20 of 213 results是可用的。所以，你可以刮下这个来计算页数。在

必需的HTML:

^{pr2}$

代码：

import requests
from bs4 import BeautifulSoup

r = requests.get('http://store.steampowered.com/tags/en-us/RPG/')
soup = BeautifulSoup(r.text, 'lxml')

def get_pages_no(soup):
    total_items = int(soup.find('span', id='NewReleases_total').text)
    items_per_page = int(soup.find('span', id='NewReleases_end').text)
    return round(total_items/items_per_page)

print(get_pages_no(soup))
# prints 11

（注意：我仍然建议使用Selenium，因为该站点的大部分内容都是动态生成的。像这样收集所有数据会很痛苦。）

网友

2楼 · 编辑于 2024-05-16 07:53:35

无需使用BeautifulSoup的另一种更快的方法：

import requests

url = "http://store.steampowered.com/contenthub/querypaginated/tags/NewReleases/render/?query=&start=20&count=20&cc=US&l=english&no_violence=0&no_sex=0&v=4&tag=RPG" # This returns your query in json format
r = requests.get(url)

print(round(r.json()['total_count'] / 20)) # total_count = number of records, 20 = number of pages shown

11

相关问题更多 >

编程相关推荐

热门问题

热门文章