使用Python在登录站点上进行Web抓取

2024-04-20 14:45:23 发布

您现在位置:Python中文网/ 问答频道 /正文

对于网络抓取来说是个新手,我试图以Python为起点登录维基百科

我似乎做不到。这是我的密码:

import requests
from bs4 import BeautifulSoup

LOGINURL = ‘https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Main+Page’

REQUESTURL = ‘ https://en.wikipedia.org/wiki/Special:Notifications’

session = requests.Session()
soup = BeautifulSoup(session.get(LOGINURL).text,’html.parser’)

token = soup.find(“input”,{“name”:”wpLoginToken})[“value”]

details = {‘wpName1’:input(‘input username’), ‘wpPassword1’:input(‘input pw’),’wpLoginToken’:token}

post = session.post(LOGINURL,data=details)

r=session.get(REQUESTURL).text
print(r)

当我打印r时,它仍然是登录页面


Tags: texthttpsorgimportinputgetsessionwikipedia
2条回答

您可以使用selenium而不是requests来登录网站。这就是你如何做到的:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Main+Page')

uname = driver.find_element_by_xpath('//*[@id="wpName1"]')

uname.click()

uname.send_keys('Username')

pswrd = driver.find_element_by_xpath('//*[@id="wpPassword1"]')

pswrd.click()

pswrd.send_keys('Password')

driver.find_element_by_xpath('//*[@id="wpLoginAttempt"]').click()

试试这个。它应该会起作用。我无法测试它,因为我在那里没有帐户

import requests
from bs4 import BeautifulSoup

LOGINURL = "https://en.wikipedia.org/w/index.php?title=Special:UserLogin&returnto=Main+Page"
REQUESTURL = 'https://en.wikipedia.org/wiki/Special:Notifications'

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36'
    r = s.get(LOGINURL)
    soup = BeautifulSoup(r.text,"html.parser")
    payload = {i['name']: i.get('value', '') for i in soup.select('input[name]')}
    payload['wpName'] = 'yourusername'  #         ->change it accordingly
    payload['wpPassword'] = 'yourpassword'  #         ->change it accordingly
    payload['title'] = 'Special:UserLogin'
    payload['wploginattempt'] = 'Log in'
    payload.pop('fulltext')
    payload.pop('search')
    payload.pop('go')
    s.post(LOGINURL,data=payload)
    r = s.get(REQUESTURL).text
    print(r)

相关问题 更多 >