使用Python从网页下载图片

3 投票

1 回答

6468 浏览

数据工程师

提问于 2025-04-17 18:45

我正在尝试写一个Python脚本，从网页上下载一张图片。我使用的是NASA的每日图片页面，每天都会发布一张新图片，文件名也不一样。

所以我的解决方案是用HTMLParser来解析网页的HTML代码，寻找“jpg”这个词，然后把图片的路径和文件名写入HTML解析器对象的一个属性（我把它命名为“output”，见下面的代码）。

我对Python和面向对象编程（OOP）还很陌生（这是我写的第一个真正的Python脚本），所以我不确定这样做是否是一般的做法。欢迎任何建议和指点。

这是我的代码：

# Grab image url
response = urllib2.urlopen('http://apod.nasa.gov/apod/astropix.html')
html = response.read() 

class MyHTMLParser(HTMLParser):
def handle_starttag(self, tag, attrs):
    # Only parse the 'anchor' tag.
    if tag == "a":
       # Check the list of defined attributes.
       for name, value in attrs:
           # If href is defined, print it.
           if name == "href":
               if value[len(value)-3:len(value)]=="jpg":
                   #print value
                   self.output=value #return the path+file name of the image

parser = MyHTMLParser()
parser.feed(html)
imgurl='http://apod.nasa.gov/apod/'+parser.output

面向对象编程自动化脚本数据提取网页抓取 html解析图片下载 NASA图片

1 个回答

要检查一个字符串是否以 "jpg" 结尾，你可以用 .endswith() 方法，而不是用 len() 和切片的方法。

if name == "href" and value.endswith("jpg"):
   self.output = value

如果网页中的搜索比较复杂，你可以使用 lxml.html 或者 BeautifulSoup，而不是 HTMLParser，比如：

from lxml import html

# download & parse web page
doc = html.parse('http://apod.nasa.gov/apod/astropix.html').getroot()

# find <a href that ends with ".jpg" and 
# that has <img child that has src attribute that also ends with ".jpg"
for elem, attribute, link, _ in doc.iterlinks():
    if (attribute == 'href' and elem.tag == 'a' and link.endswith('.jpg') and
        len(elem) > 0 and elem[0].tag == 'img' and
        elem[0].get('src', '').endswith('.jpg')):
        print(link)

回答于 2025-04-17 由 Python大师

分享举报

使用Python从网页下载图片

1 个回答

撰写回答