查询Python中的ASP.NET页面

2024-04-26 05:50:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我是一个新的网页抓取,我花了半天的时间试图找出如何模拟一个网页ASP.NET查询。你知道吗

起始链接是:http://cookcountypropertyinfo.com/default.aspx 我正在尝试使用请求将“芝加哥范布伦街235 W”插入“按物业地址搜索”表单中。手动提交查询后,链接更改为: http://cookcountypropertyinfo.com/pinresults.aspx。我正在尝试创建一个列表/字典,其中包含响应页上的所有结果和链接。这是我到目前为止的代码,但它没有发送HTTP POST请求。你知道吗

import urllib
from bs4 import BeautifulSoup
import requests

#test with 235 W Van Buren St
#using urllib
URL = 'http://cookcountypropertyinfo.com/default.aspx'
URL2 = 'http://cookcountypropertyinfo.com/pinresults.aspx'

#headers
HEADERS = {
    'User_Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0',
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
}

#start new session
session = requests.Session()
r = session.get(URL,headers=HEADERS)

#create soup object
soup = BeautifulSoup(r.content,"html.parser")

#ASP validation and session fields
view_state = soup.select("#__VIEWSTATE")[0]['value']
view_state_generator = soup.select("#__VIEWSTATEGENERATOR")[0]['value']
event_validation = soup.select("#__EVENTVALIDATION")[0]['value']

#create FORM_FIELDS
FORM_FIELDS = {
    '__EVENTTARGET': 'ctl00$ContentPlaceHolder1$PINAddressSearch$btnAddress',
    '__EVENTARGUMENT':'',
    '__VIEWSTATE': view_state,
    '__VIEWSTATEGENERATOR':view_state_generator,
    '__EVENTVALIDATION':event_validation,
    'ctl00$ContentPlaceHolder1$PINAddressSearch$pinBox1':'',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$pinBox2':'',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$pinBox3':'',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$pinBox4':'',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$pinBox5':'',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$houseNumber':'235',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$txtStreetName':"W Van Buren St",
    'ctl00$ContentPlaceHolder1$PINAddressSearch$txtUnit':'',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$txtCity':'Chicago',
    'ctl00$ContentPlaceHolder1$PINAddressSearch$txtZipCode':''
}

#POST form fields
r = session.post(URL,data=FORM_FIELDS,headers=HEADERS)
soup = BeautifulSoup(r.content)
records = soup.find_all('div',class_='linkaddressresult')
print(records)

有人能帮我吗?非常感谢。你知道吗


Tags: importcomviewhttpurl链接sessionstate