如何判断urllib.urlretrieve是否成功？

56 投票

8 回答

73849 浏览

提问于 2025-04-15 12:12

urllib.urlretrieve 这个函数在从远程 http 服务器下载文件时，如果文件不存在，它不会发出任何警告，而是默默地把一个 HTML 页面保存成你指定的文件名。举个例子：

urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')

即使在 google.com 服务器上找不到 abc.jpg，它也会安静地返回一个文件，生成的 abc.jpg 其实并不是一个有效的 jpg 文件，而是一个 HTML 页面。我猜返回的头信息（一个 httplib.HTTPMessage 实例）可以用来判断下载是否成功，但我找不到关于 httplib.HTTPMessage 的任何文档。

有没有人能提供一些关于这个问题的信息？

错误处理响应头 http 文件下载 urllib urlretrieve html 页面

8 个回答

根据文档，这个内容是没有详细说明的。

要访问消息，看起来你需要做一些这样的事情：

a, b=urllib.urlretrieve('http://google.com/abc.jpg', r'c:\abc.jpg')

b 是消息的实例。

我学到的一个技巧是，使用 Python 的自省能力总是很有用。当我输入时，

dir(b)

我能看到很多可以使用的方法或函数。

然后我开始尝试用 b 做一些事情。

比如说，

b.items()

列出了很多有趣的东西，我怀疑玩弄这些东西会让你找到想要操作的属性。

抱歉这是个初学者的回答，但我正在努力掌握如何利用自省能力来提高我的学习，而你的问题刚好出现了。

好吧，我尝试了一些有趣的事情——我想知道是否可以自动获取目录中每个不需要参数的项的输出，所以我写了：

needparam=[]
for each in dir(b):
    x='b.'+each+'()'
    try:
        eval(x)
        print x
    except:
        needparam.append(x)

回答于 2025-04-15 由 Python大师

分享举报

我保持简单：

# Simple downloading with progress indicator, by Cees Timmerman, 16mar12.

import urllib2

remote = r"http://some.big.file"
local = r"c:\downloads\bigfile.dat"

u = urllib2.urlopen(remote)
h = u.info()
totalSize = int(h["Content-Length"])

print "Downloading %s bytes..." % totalSize,
fp = open(local, 'wb')

blockSize = 8192 #100000 # urllib.urlretrieve uses 8192
count = 0
while True:
    chunk = u.read(blockSize)
    if not chunk: break
    fp.write(chunk)
    count += 1
    if totalSize > 0:
        percent = int(count * blockSize * 100 / totalSize)
        if percent > 100: percent = 100
        print "%2d%%" % percent,
        if percent < 100:
            print "\b\b\b\b\b",  # Erase "NN% "
        else:
            print "Done."

fp.flush()
fp.close()
if not totalSize:
    print

回答于 2025-04-15 由 Python大师

分享举报

如果可以的话，考虑使用 urllib2。它比 urllib 更高级，也更容易使用。

你可以很简单地检测到任何HTTP错误：

>>> import urllib2
>>> resp = urllib2.urlopen("http://google.com/abc.jpg")
Traceback (most recent call last):
<<MANY LINES SKIPPED>>
urllib2.HTTPError: HTTP Error 404: Not Found

resp 实际上是一个 HTTPResponse 对象，你可以用它做很多有用的事情：

>>> resp = urllib2.urlopen("http://google.com/")
>>> resp.code
200
>>> resp.headers["content-type"]
'text/html; charset=windows-1251'
>>> resp.read()
"<<ACTUAL HTML>>"

回答于 2025-04-15 由 Python大师

分享举报

如何判断urllib.urlretrieve是否成功？

8 个回答

撰写回答