在Python中使用Selenium抓取地址信息

2024-05-14 06:08:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试图从https://www.smartystreets.com/products/single-address-iframe中获取地址信息。我有一个脚本,在它的参数中搜索给定的地址。当我查看网站本身时,可以看到各个领域,如运营商路线

以亚利桑那州吉尔伯特市南格林菲尔德路3301号(邮编85297)为假设示例,当手动进入页面时,可以看到运输路线:R109

然而,我很难找到Selenium上的运输路线来刮它。对于如何找到任何给定地址的承运商路线,您有什么建议吗

起始代码:

driver = webdriver.Chrome('chromedriver')
address = "3301 South Greenfield Rd Gilbert, AZ 85297\n"
url = 'https://www.smartystreets.com/products/single-address-iframe'
driver.get(url)
driver.find_element_by_id("lookup-select-button").click()
driver.find_element_by_id("lookup-select").find_element_by_id("address-freeform").click()
driver.find_element_by_id("freeform-address").send_keys(address)
# Find Carrier Route here

Tags: httpscomidbyaddress地址wwwdriver
2条回答

Ajax1234,以下是您要求的代码和屏幕截图:

enter image description here

enter image description here

您可以使用driver.execute_script为字段提供输入并单击提交按钮:

from selenium import webdriver
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://www.smartystreets.com/products/single-address-iframe')
s = '3301 South Greenfield Rd Gilbert, AZ 85297'
a, a1 = s.split(' Rd ')
route = d.execute_script(f'''
   document.querySelector('#address-line1').value = '{a}'
   document.querySelector('#city').value = '{(j:=a1.split())[0][:-1]}'
   document.querySelector('#state').value = '{j[1]}'
   document.querySelector('#zip-code').value = '{j[2]}'
   document.querySelector('#submit-request').click()
   return document.querySelector('#us-street-metadata li:nth-of-type(2) .answer.col-sm-5.col-xs-3').textContent
''')

输出:

'R109'

要获得所有参数数据的完整显示,可以使用BeautifulSoup

from bs4 import BeautifulSoup as soup
... #selenium driver source here
cols = soup(d.page_source, 'html.parser').select('#us-street-output div')
data = {i.h4.text:{b.select_one('span:nth-of-type(1)').get_text(strip=True)[:-1]:b.select_one('span:nth-of-type(2)').get_text(strip=True)
         for b in i.select('ul li')} for i in cols}
print(data)
print(data['Metadata']['Congressional District'])

输出:

{'Metadata': {'Building Default': 'default', 'Carrier Route': 'R109', 'Congressional District': '05', 'Latitude': '33.291248', 'Longitude': '-111.737427', 'Coordinate Precision': 'Rooftop', 'County Name': 'Maricopa', 'County FIPS': '04013', 'eLOT Sequence': '0160', 'eLOT Sort': 'A', 'Observes DST': 'default', 'RDI': 'Commercial', 'Record Type': 'S', 'Time Zone': 'Mountain', 'ZIP Type': 'Standard'}, 'Analysis': {'Vacant': 'N', 'DPV Match Code': 'Y', 'DPV Footnotes': 'AABB', 'General Footnotes': 'L#', 'CMRA': 'N', 'EWS Match': 'default', 'LACSLink Code': 'default', 'LACSLink Indicator': 'default', 'SuiteLink Match': 'default', 'Enhanced Match': 'default'}, 'Components': {'Urbanization': 'default', 'Primary Number': '3301', 'Street Predirection': 'S', 'Street Name': 'Greenfield', 'Street Postdirection': 'default', 'Street Suffix': 'Rd', 'Secondary Designator': 'default', 'Secondary Number': 'default', 'Extra Secondary Designator': 'default', 'Extra Secondary Number': 'default', 'PMB Designator': 'default', 'PMB Number': 'default', 'City': 'Gilbert', 'Default City Name': 'Gilbert', 'State': 'AZ', 'ZIP Code': '85297', '+4 Code': '2176', 'Delivery Point': '01', 'Check Digit': '2'}}
'05'

相关问题 更多 >