如何从特定url使用requests.Session下载图像验证码

2024-04-26 23:11:22 发布

您现在位置:Python中文网/ 问答频道 /正文

大家好,我正在尝试在网站上获取图像验证码,以便抓取它。我的问题是,获取图像验证码的url包含一个参数,我无法找到它的来源。所以我开始使用parser.xpath,但它不起作用。这是我的代码:

import requests, io, re
from PIL import Image
from lxml import html
headers = {
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebkit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36",
}
session = requests.Session()
login_url = 'https://www.sat.gob.pe/WebSiteV8/popupv2.aspx?t=6'
login_form_res = session.get(login_url, headers=headers)
myhtml = login_form_res.text
evalu = ''
for match in re.finditer(r'(mysession=)(.*?)(")', myhtml):
    evalu = myhtml[match.start():match.end()]
    evalu = evalu.replace("mysession=", "")
    evalu = evalu.replace('"', '')
    print(evalu)

url_infractions = 'https://www.sat.gob.pe/VirtualSAT/modulos/RecordConductor.aspx?mysession=' + evalu
login_form_res = session.get(url_infractions, headers=headers)
myhtml = login_form_res.text
parser = html.fromstring(login_form_res.text)
idPic = parser.xpath('//img[@class="captcha_class"]/@src')
urlPic = "https://www.sat.gob.pe/VirtualSAT" + idPic[0].replace("..","")
print(urlPic)

image_content = session.get(urlPic, headers=headers)
image_file = io.BytesIO(image_content)
image = Image.open(image_file).convert('RGB').content
image.show()

因此,我有一个异常,它是TypeError:需要一个类似字节的对象,而不是“Response”。我很困惑。我将非常感谢你的帮助。提前谢谢


Tags: httpsimageimportformparserurlsessionwww