如何使用python(BeautifulSoup)从代码中提取以下src(iframe)

2024-03-28 09:53:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从中提取“src”,但没有成功。此页面是动态的,仅当我搜索时才会显示

地点:http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001

查看源:http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001

r = requests.get("http://191.253.16.180:8080/ConsultaLei/Default.aspx?numero=3001")
arquivo = BeautifulSoup(r.content, "html.parser")
for link in arquivo.find_all("iframe"):
    print(link)

Tags: srchttpdefaultgetlink动态页面content
1条回答
网友
1楼 · 发布于 2024-03-28 09:53:32

要模拟此网站请求上的帖子,您可以使用以下示例:

import requests
from bs4 import BeautifulSoup

url = "http://191.253.16.180:8080/ConsultaLei/Default.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for inp in soup.select("input[value]"):
    data[inp["name"]] = inp["value"]

data["ctl00$MainContent$txtNumero"] = "3001"  # <  this is your number
data["ctl00$MainContent$ddlEspecie"] = ""
data["ctl00$MainContent$ddlAno"] = ""
data["ctl00$MainContent$txtConteudo"] = ""
data["ctl00$MainContent$txtEmenta"] = ""
data["ctl00$MainContent$imgBuscar.x"] = "1"
data["ctl00$MainContent$imgBuscar.y"] = "9"

soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
print(soup.iframe["src"])

印刷品:

../procuradoriacg/Leis\1994/8277_LEI30011994pag0001_strDocumentoOficial.pdf

编辑:要获取多个页面,请执行以下操作:

import requests
from bs4 import BeautifulSoup

url = "http://191.253.16.180:8080/ConsultaLei/Default.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = {}
for inp in soup.select("input[value]"):
    data[inp["name"]] = inp["value"]

data["ctl00$MainContent$ddlEspecie"] = ""
data["ctl00$MainContent$ddlAno"] = ""
data["ctl00$MainContent$txtConteudo"] = ""
data["ctl00$MainContent$txtEmenta"] = ""
data["ctl00$MainContent$imgBuscar.x"] = "1"
data["ctl00$MainContent$imgBuscar.y"] = "9"


for i in range(3000, 3010):
    data["ctl00$MainContent$txtNumero"] = i

    s = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    if s.find("iframe"):
        print(i, s.iframe["src"])
    else:
        print(i, "Not Found")

印刷品:

3000 Not Found
3001 ../procuradoriacg/Leis\1994/8277_LEI30011994pag0001_strDocumentoOficial.pdf
3002 Not Found
3003 ../procuradoriacg/Leis\1994/8279_LEI30031994pag0001_strDocumentoOficial.pdf
3004 Not Found
3005 Not Found
3006 ../procuradoriacg/Leis\1994/8282_LEI30061994pag0001_strDocumentoOficial.pdf
3007 Not Found
3008 Not Found
3009 Not Found

相关问题 更多 >