下载HTML5 cache.manifest文件中列出的所有资源的最佳方法?

7 投票
1 回答
632 浏览
提问于 2025-04-17 02:09

我正在尝试了解一个HTML5应用是怎么工作的。每当我在webkit浏览器(比如Chrome和Safari)中保存页面时,发现它只包含了一部分cache.manifest文件里的资源,而不是全部。请问有没有什么库或者代码可以解析cache.manifest文件,并下载所有的资源(比如图片、脚本、样式表)呢?

(原来的代码已经移到回答里了... 新手错误 >.<)

1 个回答

0

我最开始把这个作为问题的一部分发的……(新手在StackOverflow上发帖时一般不会这么做;)

因为没有人回答我的问题,所以我决定单独发出来。来看看吧:

我写了一个Python脚本来解决这个问题,不过任何建议都很欢迎 =) (这是我第一次写Python代码,可能还有更好的方法)

import os
import urllib2
import urllib

cmServerURL = 'http://<serverURL>:<port>/<path-to-cache.manifest>'

# download file code taken from stackoverflow
# http://stackoverflow.com/questions/22676/how-do-i-download-a-file-over-http-using-python
def loadURL(url, dirToSave):
        file_name = url.split('/')[-1]
        u = urllib2.urlopen(url)
        f = open(dirToSave, 'wb')
        meta = u.info()
        file_size = int(meta.getheaders("Content-Length")[0])
        print "Downloading: %s Bytes: %s" % (file_name, file_size)

        file_size_dl = 0
        block_sz = 8192
        while True:
                buffer = u.read(block_sz)
                if not buffer:
                        break

                file_size_dl += len(buffer)
                f.write(buffer)
                status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
                status = status + chr(8)*(len(status)+1)
                print status,

        f.close()

# download the cache.manifest file
# since this request doesn't include the Conent-Length header we will use a different api =P
urllib.urlretrieve (cmServerURL+ 'cache.manifest', './cache.manifest')

# open the cache.manifest and go through line-by-line checking for the existance of files
f = open('cache.manifest', 'r')
for line in f:
        filepath = line.split('/')
        if len(filepath) > 1:
                fileName = line.strip()
                # if the file doesn't exist, lets download it
                if not os.path.exists(fileName):
                                print 'NOT FOUND: ' + line
                                dirName = os.path.dirname(fileName)
                                print 'checking dirctory: ' + dirName
                                if not os.path.exists(dirName):
                                        os.makedirs(dirName)
                                else:
                                        print 'directory exists'
                                print 'downloading file: ' + cmServerURL + line,
                                loadURL (cmServerURL+fileName, fileName)

撰写回答