Python:如何下载zip文件
我正在尝试用这段代码下载一个zip文件:
o = urllib2.build_opener( urllib2.HTTPCookieProcessor() )
#login
p = urllib.urlencode( { usernameField: usernameVal, passField: passVal } )
f = o.open(authUrl, p )
data = f.read()
print data
f.close()
#download file
f = o.open(remoteFileUrl)
localFile = open(localFile, "wb")
localFile.write(f.read())
f.close()
我得到了些二进制数据,但我“下载”的文件大小太小,而且不是一个有效的zip文件。我是不是没有正确获取这个zip文件?下面是我用f = o.open(remoteFileUrl)
时得到的HTTP响应头。我不知道是否需要特别处理这个响应:
HTTP/1.1 200 OK 服务器:
Apache-Coyote/1.1 Pragma: private
Cache-Control: must-revalidate
Expires: Tue, 31 Dec 1997 23:59:59 GMT
Content-Disposition: inline;
filename="files.zip";
Content-Type: application/zip
Transfer-Encoding: chunked
4 个回答
1
这里有一个更可靠的解决方案,使用urllib2来分块下载文件,并打印下载状态。
import os
import urllib2
import math
def downloadChunks(url):
"""Helper to download large files
the only arg is a url
this file will go to a temp directory
the file will also be downloaded
in chunks and print out how much remains
"""
baseFile = os.path.basename(url)
#move the file to a more uniq path
os.umask(0002)
temp_path = "/tmp/"
try:
file = os.path.join(temp_path,baseFile)
req = urllib2.urlopen(url)
total_size = int(req.info().getheader('Content-Length').strip())
downloaded = 0
CHUNK = 256 * 10240
with open(file, 'wb') as fp:
while True:
chunk = req.read(CHUNK)
downloaded += len(chunk)
print math.floor( (downloaded / total_size) * 100 )
if not chunk: break
fp.write(chunk)
except urllib2.HTTPError, e:
print "HTTP Error:",e.code , url
return False
except urllib2.URLError, e:
print "URL Error:",e.reason , url
return False
return file
1
如果你不介意把整个压缩文件都加载到内存里,最快的读取和写入方法如下:
data = f.readlines()
with open(localFile,'wb') as output:
output.writelines(data)
如果你想边接收数据边读取和写入,可以使用下面的方法:
with open(localFile, "wb") as output:
chunk = f.read()
while chunk:
output.write(chunk)
chunk = f.read()
这个方法稍微麻烦一点,但可以避免一次性把整个文件都放到内存里。希望对你有帮助。
10
f.read()
并不一定会把整个文件都读出来,它只是读取一部分内容(如果文件很小,可能会是整个文件,但如果文件很大,就不会)。
你需要像这样循环读取这些部分:
while 1:
packet = f.read()
if not packet:
break
localFile.write(packet)
f.close()
f.read()
返回一个空的部分,表示你已经读完了整个文件。