在Div Python（BeatifulSoup）中对URL进行Web抓取

<div class="theoplayer-poster" style="z-index: 1; display: inline-block; vertical-align: middle; background-repeat: no-repeat; background-position: 50% 50%; background-size: contain; cursor: pointer; margin: 0px; padding: 0px; position: absolute; top: 0px; right: 0px; bottom: 0px; left: 0px; height: 100%; background-image: url("//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-exlarge-169.jpg");"></div>

1条回答

网友

1楼 · 发布于 2024-04-19 08:45:50

我认为您的主要问题是您指定的URL不包含该名称的div类。下面的代码处理URL的内容，希望它能解释足够多的内容，以了解如何解析出您想要的内容。在

仅供参考，一个快速打印的soup将给你所有的文本，发送到剪贴板，放入编辑器，可以突出显示文本和搜索你所追求的网址。返回查看div类等

另外，在上面重新解析JS-urlopen不会为您解析JS-只有一个browser对象可以这样做。如果你的字符串需要JS解析来将它插入dom，我怀疑你运气不好。在

from urllib import urlopen
from bs4 import BeautifulSoup

# example div
# <div class="js-gigya-sharebar gigya-sharebar" data-description="April 25, 2018" data-image-src="//cdn.cnn.com/cnnnext/dam/assets/180424173851-ten-0425-00011501-super-tease.jpg" data-isshorturl="true" data-link="https://cnn.it/2HVJmx0" data-subtitle="" data-title="CNN 10 - April 25, 2018" data-twitter-account="CNN"></div>


def cnn_get_thumb(cnn_url):
    page = urlopen(cnn_url)
    soup = BeautifulSoup(page, 'html.parser')
    img = soup.find('div', class_="js-gigya-sharebar")['data-image-src']
    return img


print cnn_get_thumb("http://cnn.com/2018/04/24/cnn10/ten-content-weds/index.html")

相关问题更多 >

编程相关推荐

热门问题

热门文章