请求和Web垃圾:POST未返回预期结果

2024-04-24 05:06:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从this booking website检索搜索结果。 我首先发出一个GET请求,并使用BeautifulSoup检索POST请求数据所需的_csrf。然后,我发出POST请求,传递相关数据,但不是搜索结果,而是再次返回行程选择表单。我做错了什么

token = get_csrf_token()
params = set_params(token)
res = get_request(params)

def get_csrf_token():
    url = 'https://www.booking.alilaurogruson.it/booking/services/datas'
    r = requests.get(url, verify=False, timeout=10)
    csrf = search_csrf_token(r)
    return csrf

def search_csrf_token(r):
    soup = BeautifulSoup(r.content,'html.parser')
    csrf = soup.find('input', attrs={'name': '_csrf'})['value']
    return csrf

def set_params( csrf):
    params = urllib.parse.urlencode({
        "inputSearchFerriesBean.areasGoing": "1002",
        "inputSearchFerriesBean.tradesGoing": "12",
        "inputSearchFerriesBean.ret": "true",
        "inputSearchFerriesBean.areasReturn": "1007",
        "inputSearchFerriesBean.tradesReturn": "18",
        "inputSearchFerriesBean.dateGoing": "24/03/21",
        "inputSearchFerriesBean.dateReturn": "24/03/21",
        "_csrf": csrf
    })
    return params

def get_request(params):
    url = "https://www.booking.alilaurogruson.it/booking/services/search.action"
    r = requests.post(url, data=params, headers=headers, verify=False, timeout=10)
    return r

1条回答
网友
1楼 · 发布于 2024-04-24 05:06:46

使用session将帮助您重用cookies,cookies在获取所需内容方面起着重要作用。您不需要对参数进行编码,因为请求模块知道如何处理它

请尝试以下方法:

import urllib
import urllib3
import requests
from bs4 import BeautifulSoup

urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

def get_csrf_token(s):
    url = 'https://www.booking.alilaurogruson.it/booking/services/datas'
    r = s.get(url, verify=False, timeout=10)
    csrf = search_csrf_token(r)
    return csrf

def search_csrf_token(r):
    soup = BeautifulSoup(r.content,'html.parser')
    csrf = soup.find('input', attrs={'name': '_csrf'})['value']
    return csrf

def set_params( csrf):
    params = {
        "inputSearchFerriesBean.areasGoing": "1002",
        "inputSearchFerriesBean.tradesGoing": "12",
        "inputSearchFerriesBean.ret": "true",
        "inputSearchFerriesBean.areasReturn": "1007",
        "inputSearchFerriesBean.tradesReturn": "18",
        "inputSearchFerriesBean.dateGoing": "24/03/21",
        "inputSearchFerriesBean.dateReturn": "24/03/21",
        "_csrf": csrf
    }
    return params

def get_request(s,params):
    url = "https://www.booking.alilaurogruson.it/booking/services/search.action"
    r = s.post(url, data=params, verify=False, timeout=10)
    return r


with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36'
    token = get_csrf_token(s)
    params = set_params(token)
    s.headers['Referer'] = 'https://www.booking.alilaurogruson.it/booking/services/datas'
    res = get_request(s,params)
    soup = BeautifulSoup(res.text,"lxml")
    item = soup.select_one(".sectionTitle > span").text
    print(item)

相关问题 更多 >