在Python SimpleHTTPServer中下载整个目录

4

没有简单的一行代码可以做到这一点。还有，你说的“下载整个目录”是指打包成tar文件还是zip文件呢？

不过，你可以按照以下步骤来操作：

从SimpleHTTPRequestHandler这个类派生一个新类，或者直接复制它的代码。
修改list_directory方法，让它返回一个“下载整个文件夹”的链接。
修改copyfile方法，这样在你的链接中就可以把整个目录打包成zip文件并返回。
你可以缓存这个zip文件，这样就不用每次都重新打包文件夹，而是检查一下有没有文件被修改。

这会是一个有趣的练习哦 :)

回答于 2025-04-15 由 Python大师

分享举报

6

我为你做了这个修改，我不知道有没有更好的方法，但：

只需保存这个文件（例如：ThreadedHTTPServer.py），然后这样访问：

$ python -m /path/to/ThreadedHTTPServer PORT

BPaste 原始版本

这个修改也支持多线程，所以你在下载和浏览的时候不会有问题，代码虽然不太整齐，但：

from BaseHTTPServer import HTTPServer, BaseHTTPRequestHandler
from SocketServer import ThreadingMixIn
import threading
import SimpleHTTPServer
import sys, os, zipfile

PORT = int(sys.argv[1])

def send_head(self):
    """Common code for GET and HEAD commands.

    This sends the response code and MIME headers.

    Return value is either a file object (which has to be copied
    to the outputfile by the caller unless the command was HEAD,
    and must be closed by the caller under all circumstances), or
    None, in which case the caller has nothing further to do.

    """
    path = self.translate_path(self.path)
    f = None

    if self.path.endswith('?download'):

        tmp_file = "tmp.zip"
        self.path = self.path.replace("?download","")

        zip = zipfile.ZipFile(tmp_file, 'w')
        for root, dirs, files in os.walk(path):
            for file in files:
                if os.path.join(root, file) != os.path.join(root, tmp_file):
                    zip.write(os.path.join(root, file))
        zip.close()
        path = self.translate_path(tmp_file)

    elif os.path.isdir(path):

        if not self.path.endswith('/'):
            # redirect browser - doing basically what apache does
            self.send_response(301)
            self.send_header("Location", self.path + "/")
            self.end_headers()
            return None
        else:

            for index in "index.html", "index.htm":
                index = os.path.join(path, index)
                if os.path.exists(index):
                    path = index
                    break
            else:
                return self.list_directory(path)
    ctype = self.guess_type(path)
    try:
        # Always read in binary mode. Opening files in text mode may cause
        # newline translations, making the actual size of the content
        # transmitted *less* than the content-length!
        f = open(path, 'rb')
    except IOError:
        self.send_error(404, "File not found")
        return None
    self.send_response(200)
    self.send_header("Content-type", ctype)
    fs = os.fstat(f.fileno())
    self.send_header("Content-Length", str(fs[6]))
    self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
    self.end_headers()
    return f

def list_directory(self, path):

    try:
        from cStringIO import StringIO
    except ImportError:
        from StringIO import StringIO
    import cgi, urllib

    """Helper to produce a directory listing (absent index.html).

    Return value is either a file object, or None (indicating an
    error).  In either case, the headers are sent, making the
    interface the same as for send_head().

    """
    try:
        list = os.listdir(path)
    except os.error:
        self.send_error(404, "No permission to list directory")
        return None
    list.sort(key=lambda a: a.lower())
    f = StringIO()
    displaypath = cgi.escape(urllib.unquote(self.path))
    f.write('<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">')
    f.write("<html>\n<title>Directory listing for %s</title>\n" % displaypath)
    f.write("<body>\n<h2>Directory listing for %s</h2>\n" % displaypath)
    f.write("<a href='%s'>%s</a>\n" % (self.path+"?download",'Download Directory Tree as Zip'))
    f.write("<hr>\n<ul>\n")
    for name in list:
        fullname = os.path.join(path, name)
        displayname = linkname = name
        # Append / for directories or @ for symbolic links
        if os.path.isdir(fullname):
            displayname = name + "/"
            linkname = name + "/"
        if os.path.islink(fullname):
            displayname = name + "@"
            # Note: a link to a directory displays with @ and links with /
        f.write('<li><a href="%s">%s</a>\n'
                % (urllib.quote(linkname), cgi.escape(displayname)))
    f.write("</ul>\n<hr>\n</body>\n</html>\n")
    length = f.tell()
    f.seek(0)
    self.send_response(200)
    encoding = sys.getfilesystemencoding()
    self.send_header("Content-type", "text/html; charset=%s" % encoding)
    self.send_header("Content-Length", str(length))
    self.end_headers()
    return f

Handler = SimpleHTTPServer.SimpleHTTPRequestHandler
Handler.send_head = send_head
Handler.list_directory = list_directory

class ThreadedHTTPServer(ThreadingMixIn, HTTPServer):
    """Handle requests in a separate thread."""

if __name__ == '__main__':
    server = ThreadedHTTPServer(('0.0.0.0', PORT), Handler)
    print 'Starting server, use <Ctrl-C> to stop'
    server.serve_forever()

回答于 2025-04-15 由 Python大师

分享举报

5

看看这些源代码，比如在线的这里。现在，如果你用一个指向目录的URL来调用服务器，它会提供这个目录下的index.html文件，或者如果没有这个文件，就会调用list_directory方法。你可能想要的是把这个目录的内容打包成一个zip文件（我想是递归地，也就是包括所有子目录的内容），然后提供这个文件？显然，这不是简单的一行代码就能解决的，因为你需要替换掉现在的68到80行（在send_head方法中），还有整个list_directory方法，98到137行——这已经是超过50行的改动了;-）。

如果你愿意做几十行的修改，而不是一行，并且我描述的功能正是你想要的，那么你当然可以使用cStringIO.StringIO对象来构建所需的zip文件，使用ZipFile类，并通过对相关目录使用os.walk来填充它（假设你想要递归地获取所有子目录）。但这绝对不是一行代码就能搞定的;-）。

回答于 2025-04-15 由 Python大师

分享举报

在Python SimpleHTTPServer中下载整个目录

5 个回答

撰写回答