成功登录后无法获取html源

2024-04-24 10:46:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我想获取一些网页的html源代码。使用request.Sessions,我能够成功登录到那里(200响应)。但在那之后,当我试图获取某些页面的html源代码时,我又获得了登录页面的html源代码。这是我的代码:

import requests
from bs4 import BeautifulSoup
from random import randint
from time import sleep

USERNAME = "email"
PASSWORD = "password"

LOGIN_URL = "login_url"
URL = "base_url"
FILEPATH = "File_Location"

with requests.Session() as s:

    r = s.get(LOGIN_URL)
    soup = BeautifulSoup(r.content, "lxml")

    hidden = soup.find_all("input", {'type':'hidden'})
    target = LOGIN_URL + soup.find("form")['action']
    payload = {x["name"]: x["value"] for x in hidden}

    #add login creds to the dict
    payload["user[email]"] = USERNAME
    payload["user[password]"] = PASSWORD
    r = s.post(target, data=payload)
    print(r) \\-> <Response [200]>

    for i in range(587, 608):
        sleep(randint(1,5))
        url1 = URL + str(i)
        result = s.get(url1, headers = dict(referer = url1))
        fn = FILEPATH + str(i) + ".html"
        data = result.text   
        soup = BeautifulSoup(data, "html.parser") // -> This gives me login page's source code