下载HTML5 cache.manifest文件中列出的所有资源的最佳方法?
我正在尝试了解一个HTML5应用是怎么工作的。每当我在webkit浏览器(比如Chrome和Safari)中保存页面时,发现它只包含了一部分cache.manifest文件里的资源,而不是全部。请问有没有什么库或者代码可以解析cache.manifest文件,并下载所有的资源(比如图片、脚本、样式表)呢?
(原来的代码已经移到回答里了... 新手错误 >.<)
1 个回答
0
我最开始把这个作为问题的一部分发的……(新手在StackOverflow上发帖时一般不会这么做;)
因为没有人回答我的问题,所以我决定单独发出来。来看看吧:
我写了一个Python脚本来解决这个问题,不过任何建议都很欢迎 =) (这是我第一次写Python代码,可能还有更好的方法)
import os
import urllib2
import urllib
cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>'
# download file code taken from stackoverflow
# http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
def loadURL(url, dirToSave):
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(dirToSave, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
# download the cache.manifest file
# since this request doesn't include the Conent-Length header we will use a different api =P
urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest')
# open the cache.manifest and go through line-by-line checking for the existance of files
f = open('cache.manifest', 'r')
for line in f:
filepath = line.split('/')
if len(filepath) > 1:
fileName = line.strip()
# if the file doesn't exist, lets download it
if not os.path.exists(fileName):
print 'NOT FOUND: ' + line
dirName = os.path.dirname(fileName)
print 'checking dirctory: ' + dirName
if not os.path.exists(dirName):
os.makedirs(dirName)
else:
print 'directory exists'
print 'downloading file: ' + cmServerURL + line,
loadURL (cmServerURL+fileName, fileName)