Ranker.com python beautifulsoup刮板不会刮取整个网站

2024-06-06 07:53:04 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我正在开发一个beautifulsoup刮板,可以从ranker.com页面列表中刮取100个名字。代码如下

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.ranker.com/crowdranked-list/best-anime-series-all-time')
soup = BeautifulSoup(r.text, 'html.parser')

for p in soup.find_all('a', class_='gridItem_name__3zasT gridItem_nameLink__3jE6V'):
    print(p.text) 

这将起作用,并将输出作为

Attack on Titan
My Hero Academia
Naruto: Shippuden
Hunter x Hunter (2011)
One-Punch Man
Fullmetal Alchemist: Brotherhood
One Piece
Naruto
Tokyo Ghoul
Assassination Classroom
The Seven Deadly Sins
Parasyte: The Maxim
Code Geass
Haikyuu!! 
Your Lie in April
Noragami
Akame ga Kill!
Dragon Ball
No Game No Life
Fullmetal Alchemist
Dragon Ball Z
Cowboy Bebop
Steins;Gate
Mob Psycho 100
Fairy Tail

我希望程序从列表中提取100项,但它只提供25项。有人能帮我吗


Tags: textinimportcom列表allrequestsone
1条回答
网友
1楼 · 发布于 2024-06-06 07:53:04

附加项来自API调用,带有offset和limit参数,用于确定下一批返回的25个结果。您只需删除这两个选项,即可获得最多200个结果,或者保留限制并设置为100。您可以忽略API调用中除端点之外的所有内容

import requests

r = requests.get('https://api.ranker.com/lists/538997/items?limit=100')
data = r.json()['listItems']
ranked_titles = {i['rank']:i['name'] for i in data}
print(ranked_titles)

相关问题 更多 >