python美丽汤iframe文档HTML ex

2024-05-15 21:05:24 发布

您现在位置：Python中文网/ 问答频道 /正文

743

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试学习一些漂亮的汤，并从一些iframe中获取一些html数据-但是到目前为止，我还不是很成功。

因此，分析iFrame本身似乎不是BS4的问题，但我似乎并没有从中获得嵌入的内容-无论我做什么。

例如，考虑下面的iFrame（这是我在chrome开发工具上看到的）：

<iframe frameborder="0" marginwidth="0" marginheight="0" scrolling="NO"
src="http://www.engineeringmaterials.com/boron/728x90.html "width="728" height="90">
#document <html>....</html></iframe>

其中，<html>...</html>是我感兴趣提取的内容。

但是，当我使用以下BS4代码时：

iFrames=[] # qucik bs4 example
for iframe in soup("iframe"):
    iFrames.append(soup.iframe.extract())

我得到：

<iframe frameborder="0" marginwidth="0" marginheight="0" scrolling="NO" src="http://www.engineeringmaterials.com/boron/728x90.html" width="728" height="90">

换句话说，我得到的iframe中没有文档<html>...</html>。

我试着做了如下的事情：

iFrames=[] # qucik bs4 example
iframexx = soup.find_all('iframe')
for iframe in iframexx:
    print iframe.find_all('html')

。。但这似乎行不通。。

所以，我想我的问题是，如何从iFrame元素中可靠地提取这些文档对象<html>...</html>。

Tags： no src http 内容 html www iframe soup

1条回答

网友

1楼 · 发布于 2024-05-15 21:05:24

浏览器在单独的请求中加载iframe内容。你也得这么做：

for iframe in iframexx:
    response = urllib2.urlopen(iframe.attrs['src'])
    iframe_soup = BeautifulSoup(response)

记住：BeautifulSoup不是浏览器；它也不会为您获取图像、CSS和JavaScript资源。

python美丽汤iframe文档HTML ex

相关问题更多 >

编程相关推荐

热门问题

热门文章

python美丽汤iframe文档HTML ex

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >