使用搜索和非动态URI的Python Web废弃

import requests from bs4 import BeautifulSoup url = 'http://comprasnet.gov.br/acesso.asp?url=/ConsultaLicitacoes/ConsLicitacao_texto.asp' html = requests.get(url) bs0bj = BeautifulSoup(html.content,'html.parser') print(bsObj) # And from now on i cant go any further

from selenium import webdriver from selenium.webdriver.common.keys import Keys import requests from bs4 import BeautifulSoup # Acess the page and input the search on the field driver = webdriver.Chrome() driver.get('http://comprasnet.gov.br/acesso.asp?url=/ConsultaLicitacoes/ConsLicitacao_texto.asp') driver.switch_to.frame('main2') busca = driver.find_element_by_id("txtTermo") busca.send_keys("GESTAO DE PESSOAS") #data_inicio = driver.find_element_by_id('dt_publ_ini') #data_inicio.send_keys("01/01/2018") #data_fim = driver.find_element_by_id('dt_publ_fim') #data_fim.send_keys('20/12/2018') botao = driver.find_element_by_id('ok') botao.click()

1条回答

网友

1楼 · 发布于 2024-04-18 22:02:18

问题是，您的初始搜索页面使用了搜索结果框架，这使得BeautifulSoup很难使用它。我可以通过使用稍微不同的URL和^{}来获得搜索结果：

>>> from mechanicalsoup import StatefulBrowser
>>> sb = StatefulBrowser()
>>> sb.open('http://comprasnet.gov.br/ConsultaLicitacoes/ConsLicitacao_texto.asp')
<Response [200]>
>>> sb.select_form()  # select the search form
<mechanicalsoup.form.Form object at 0x7f2c10b1bc18>
>>> sb['txtTermo'] = 'search text'  # input the text to search for
>>> sb.submit_selected()  # submit the form
<Response [200]>
>>> page = sb.get_current_page()  # get the returned page in BeautifulSoup form
>>> type(page)
<class 'bs4.BeautifulSoup'>

请注意，我在这里使用的URL是包含搜索表单的框架的URL，而不是您提供的内联表单的页面。这就消除了一层间接性。你知道吗

MechanicalSoup构建在BeautifulSoup之上，提供了一些与网站交互的工具，与旧的mechanize库类似。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章