下载图像，保存到文件夹，检查文件是否存在

4 投票

2 回答

7727 浏览

提问于 2025-04-16 02:09

我有一个产品的记录集（用sqlalchemy），我正在循环遍历这些产品，想要下载一张图片并保存到一个文件夹里。

如果这个文件夹不存在，我想先创建它。

另外，我还想先检查一下这个文件夹里是否已经有这个图片文件。 如果有，就不下载，直接跳过这一行。

/myscript.py
/images/

我希望这个图片文件夹和我的脚本文件在同一个目录下，无论它存储在哪里。

到目前为止，我已经有了：

q = session.query(products)

for p in q:
     if p.url:
          req = urllib2.Request(p.url)
          try:
                 response = urllib2.urlopen(req)
                 image = response.read()

                 ???
          except URLError e:
                 print e

文件操作文件夹管理数据库记录图像下载条件检查存储路径

2 个回答

文件名应该在 response.info()['Content-Disposition'] 里（这个字符串中有个 filename=something，在分号后面）——如果没有这个头信息，或者没有分号，或者没有 filename 部分，你可以用 urlparse.urlsplit(p.url) 来获取最后一个非空部分的 os.path.basename（或者更简单点，虽然这样做可能让一些人不爽，你可以直接用 p.url.split('/')[-1] ;-）。

关于文件名就说到这里，假设我们叫它 fn。

你的脚本所在的目录可以用 sd = os.path.dirname(__file__) 来获取。

那么它的 images 子目录就是 sdsd = os.path.join(sd, 'images')。

要检查这个子目录是否存在，如果不存在就创建它，

if not os.path.exists(sdsd): os.makedir(sdsd)

要检查你想写入的文件是否已经存在，

if os.path.exists(os.path.join(sdsd, fn)): ...

所有这些代码都放在你标记为 ??? 的地方。代码量有点多，所以最好把它做成一个函数，接收 p.url 和 response 作为参数（它可以自己读取 image；-)），如果你想以后把这个函数移动到一个单独的模块里，也可以考虑把 __file__ 作为参数传进去（我推荐这样做！）。

当然，你需要 import os 来使用所有这些 os 和 os.path 的调用，如果你决定使用后者的标准库模块，还需要 import urlparse。

回答于 2025-04-16 由 Python大师

分享举报

我觉得你可以直接用 urllib.urlretrieve 来解决这个问题：

import errno
import os
import urllib

def require_dir(path):
    try:
        os.makedirs(path)
    except OSError, exc:
        if exc.errno != errno.EEXIST:
            raise

directory = os.path.join(os.path.dirname(os.path.abspath(__file__)), "images")
require_dir(directory)
filename = os.path.join(directory, "stackoverflow.html")

if not os.path.exists(filename):
    urllib.urlretrieve("http://stackoverflow.com", filename)

回答于 2025-04-16 由 Python大师

分享举报

下载图像，保存到文件夹，检查文件是否存在

2 个回答

撰写回答