请求对象不提供任何输出

2024-04-26 19:04:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试用Bs4和请求解析reddit网站上的图像,但我不知道如何解析。这是我的密码:

from bs4 import BeautifulSoup
import requests

source = requests.get('https://www.reddit.com/r/programmingmemes/').text  #requests object as text
soup = BeautifulSoup(source, 'lxml')

img = soup.find('div', class_='_3Oa0THmZ3f5iZXAQ0hBJ0k')  # finding first post's class
div = img.find('div')  #finding 'div'
src = div.find('src')  # finding 'srs'
print(src)

我希望输出为:

<div> <img alt="Post image" class="_2_tDEnGMLxpM6uOa2kaDB3 ImageBox-image media-element _1XWObl-3b9tPy64oaG6fax" src="preview.redd.it/ik1g60hzoqc61.jpg? width=640&amp;crop=smart&amp;auto=webp&amp;s=c5fedaba3e5627cf8fcdd008317ac39789d71abc" style="max-height:512px"/> </div>

Tags: textimageimportdivsrcsourceimgfind
2条回答

您需要requests.get()方法中的头才能获得正确的响应

from bs4 import BeautifulSoup
import requests
headers = {
"origin": "https://www.reddit.com",
"referer": "https://www.reddit.com/r/programmingmemes/",
# "sec-ch-ua": '"Google Chrome";v="87", " Not;A Brand";v="99", "Chromium";v="87"',
# "sec-ch-ua-mobile": "?0",
"sec-fetch-dest": "empty",
"sec-fetch-mode": "cors",
"sec-fetch-site": "same-origin",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36"
}
source = requests.get('https://www.reddit.com/r/programmingmemes/', headers=headers)
print(source.status_code)
soup = BeautifulSoup(source.text, 'lxml')

img = soup.find('div', class_='_3JgI-GOrkmyIeDeyzXdyUD _2CSlKHjH7lsjx0IpjORx14')
link = img.a["href"]
print(link)
image = img.img["src"]
print(image)

您使用了错误的类名,我认为_2_tDEnGMLxpM6uOa2kaDB3应该是正确的(该站点上没有类名为_3Oa0THmZ3f5iZXAQ0hBJ0k的元素)

相关问题 更多 >