dryscrape和BeautifulSoup以获取js呈现的ifram中的所有行

import dryscrape from bs4 import BeautifulSoup myurl = 'http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp' session = dryscrape.Session() session.visit(myurl) response = session.body() soup = BeautifulSoup(response,'lxml') table = soup.find_all("td")

1条回答

网友

1楼 · 发布于 2024-04-19 14:46:54

你不需要为这个特定的页面刮干。因为您要获取的整个表都在源代码html中，您只需执行以下操作：

from bs4 import BeautifulSoup
import requests

myurl = 'http://apps2.eere.energy.gov/wind/windexchange/economics_tools.asp'
soup = BeautifulSoup(requests.get(myurl).text,'lxml')
table = soup.find_all("td")

或者，使用当前设置：

^{pr2}$

将在dryscrape会话中为您提供td标记的节点。那样的话你就不需要靓汤了。在

在会话.正文（）提供当前加载到dom中的html。因为java脚本正在执行该操作并更改dom中的内容。正因为如此，你可以做一个for循环，在这个循环中点击每一个next按钮，然后在每次迭代之后把身体喂进漂亮的汤里，但这对我来说似乎没有必要。在

useful reference

相关问题更多 >

编程相关推荐

热门问题

热门文章