如何将“http”添加到“src”属性?

2024-04-19 10:00:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从一些网站中获取内容,这是网站HTML:

<div class="answer-given-body ugc-base">
  <p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
  </div>`

在上述src属性中img标记内的HTML中,它不是以“HTTP”开头的,因此在保存HTML文件时图像不会显示,如何编辑src属性并在其前面添加“HTTP”


1条回答
网友
1楼 · 发布于 2024-04-19 10:00:17

要将“https”添加到标记src,可以使用[]和“https”访问src属性,如下所示:

from bs4 import BeautifulSoup


html = """
<div class="answer-given-body ugc-base">
  <p><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/><img alt="" src="//d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/></p>
  </div>
"""

soup = BeautifulSoup(html, "html.parser")

# Select all the `img` tags
for tag in soup.select(".answer-given-body.ugc-base img"):
    tag["src"] = "https:" + tag["src"]

print(soup.prettify())

输出:

<div class="answer-given-body ugc-base">
 <p>
  <img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F61d%2F61d6042d-e4dd-41d9-9a5c-0ceb481ddbc9%2FphpKFGb9B.png"/>
  <img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2Fd72%2Fd72dfa6c-8e50-475a-86cf-678a04ae4606%2FphpQZYPYo.png"/>
  <img alt="" src="https://d2vlcm61l7u1fs.cloudfront.net/media%2F4c7%2F4c775a01-8590-4b93-bc20-03d282586f95%2FphpE7XFWI.png"/>
 </p>
</div>

相关问题 更多 >