在Python中通过post请求发送数据时出现问题

2024-05-12 18:11:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图输入一个决定的开始和结束日期到2个输入框中的Gosport理事会网站发送一个邮政请求。当我打印出从发送请求后收到的文本时,它会给出输入页上显示的信息,而不是加载的页面

import requests

payload = {
    "applicationDecisionStart": "1/8/2018",
    "applicationDecisionEnd": "1/10/2018",
}

with requests.Session() as session:
    r = session.get("https://publicaccess.gosport.gov.uk/online-applications/search.do?action=advanced", timeout=10, data=payload)

    print(r.text)

如果我执行它,我希望它打印出HTML,例如href链接 <a href="/online-applications/applicationDetails.do?keyVal=PEA12JHO07E00&amp;activeTab=summary"> 但我的代码不会显示这样的内容


Tags: 文本信息网站session页面requestsdoonline
2条回答

url和数据不正确

使用Chrome分析响应

按f12打开开发者工具,切换到“网络”项,然后提交页面,分析Chrome发起的第一个请求。你知道吗

您需要:

  1. 听者一般请求url
  2. 听者请求头
  3. 听众数据

您需要一些包来解析html,例如bs4

我观察到的POST,而不是GET,如下所示(忽略POST中的空字段):

from bs4 import BeautifulSoup as bs
import requests

payload = {
    'caseAddressType':'Application'
    ,'date(applicationDecisionStart)' :'1/8/2018'
    ,'date(applicationDecisionEnd)': '1/10/2018'
    , 'searchType' : 'Application'
}

with requests.Session() as s:
    r = s.post('https://publicaccess.gosport.gov.uk/online-applications/advancedSearchResults.do?action=firstPage', data = payload)
    soup = bs(r.content, 'lxml')
    info = [(item.text.strip(), item['href']) for item in soup.select('#searchresults a')]
    print(info)
    ## later pages
    #https://publicaccess.gosport.gov.uk/online-applications/pagedSearchResults.do?action=page&searchCriteria.page=2

循环页面:

from bs4 import BeautifulSoup as bs
import requests

payload = {
    'caseAddressType':'Application'
    ,'date(applicationDecisionStart)' :'1/8/2018'
    ,'date(applicationDecisionEnd)': '1/10/2018'
    , 'searchType' : 'Application'
}

with requests.Session() as s:
    r = s.post('https://publicaccess.gosport.gov.uk/online-applications/advancedSearchResults.do?action=firstPage', data = payload)
    soup = bs(r.content, 'lxml')
    info = [(item.text.strip(), item['href']) for item in soup.select('#searchresults a')]
    print(info)
    pages = int(soup.select('span + a.page')[-1].text)

    for page in range(2, pages + 1):
        r = s.get('https://publicaccess.gosport.gov.uk/online-applications/pagedSearchResults.do?action=page&searchCriteria.page={}'.format(page))
        soup = bs(r.content, 'lxml')
        info = [(item.text.strip(), item['href']) for item in soup.select('#searchresults a')]
        print(info)       

相关问题 更多 >