我想获取一些网页的html源代码。使用request.Sessions
,我能够成功登录到那里(200响应)。但在那之后,当我试图获取某些页面的html源代码时,我又获得了登录页面的html源代码。这是我的代码:
import requests
from bs4 import BeautifulSoup
from random import randint
from time import sleep
USERNAME = "email"
PASSWORD = "password"
LOGIN_URL = "login_url"
URL = "base_url"
FILEPATH = "File_Location"
with requests.Session() as s:
r = s.get(LOGIN_URL)
soup = BeautifulSoup(r.content, "lxml")
hidden = soup.find_all("input", {'type':'hidden'})
target = LOGIN_URL + soup.find("form")['action']
payload = {x["name"]: x["value"] for x in hidden}
#add login creds to the dict
payload["user[email]"] = USERNAME
payload["user[password]"] = PASSWORD
r = s.post(target, data=payload)
print(r) \\-> <Response [200]>
for i in range(587, 608):
sleep(randint(1,5))
url1 = URL + str(i)
result = s.get(url1, headers = dict(referer = url1))
fn = FILEPATH + str(i) + ".html"
data = result.text
soup = BeautifulSoup(data, "html.parser") // -> This gives me login page's source code
目前没有回答
相关问题 更多 >
编程相关推荐