使用Python下载网页上的所有链接（相关文档）

13 投票

2 回答

30281 浏览

提问于 2025-04-16 17:29

我需要从一个网页上下载很多文档。这些文档包括wmv文件、PDF文件、BMP文件等等。当然，它们都有链接。每次我都得右键点击一个文件，选择“另存为链接”，然后把它保存为所有文件类型。请问用Python能做到这一点吗？我在StackOverflow上查过，大家都回答了如何获取网页上的链接。我想下载实际的文件。提前谢谢大家。（这不是作业问题 :））

自动化脚本文件下载网络编程网页抓取文档处理链接提取

2 个回答

可以参考这个链接里的Python代码：wget-vs-urlretrieve-of-python。
你也可以很简单地使用Wget来实现这个功能。试试在中使用 --limit、--recursive 和 --accept 这些命令。例如： wget --accept wmv,doc --limit 2 --recursive http://www.example.com/files/

回答于 2025-04-16 由 Python大师

分享举报

这里有一个例子，教你如何从 http://pypi.python.org/pypi/xlwt 下载一些你选择的文件。

首先，你需要先安装一个叫做 mechanize 的工具，下载地址在这里：http://wwwsearch.sourceforge.net/mechanize/download.html

import mechanize
from time import sleep
#Make a Browser (think of this as chrome or firefox etc)
br = mechanize.Browser()

#visit http://stockrt.github.com/p/emulating-a-browser-in-python-with-mechanize/
#for more ways to set up your br browser object e.g. so it look like mozilla
#and if you need to fill out forms with passwords.

# Open your site
br.open('http://pypi.python.org/pypi/xlwt')

f=open("source.html","w")
f.write(br.response().read()) #can be helpful for debugging maybe

filetypes=[".zip",".exe",".tar.gz"] #you will need to do some kind of pattern matching on your files
myfiles=[]
for l in br.links(): #you can also iterate through br.forms() to print forms on the page!
    for t in filetypes:
        if t in str(l): #check if this link has the file extension we want (you may choose to use reg expressions or something)
            myfiles.append(l)


def downloadlink(l):
    f=open(l.text,"w") #perhaps you should open in a better way & ensure that file doesn't already exist.
    br.click_link(l)
    f.write(br.response().read())
    print l.text," has been downloaded"
    #br.back()

for l in myfiles:
    sleep(1) #throttle so you dont hammer the site
    downloadlink(l)

注意：在某些情况下，你可能想把 br.click_link(l) 替换成 br.follow_link(l)。这两者的区别在于，click_link 会返回一个请求对象，而 follow_link 则会直接打开链接。想了解更多，可以查看这个链接：Mechanize 中 br.click_link() 和 br.follow_link() 的区别

回答于 2025-04-16 由 Python大师

分享举报

使用Python下载网页上的所有链接（相关文档）

2 个回答

撰写回答