使用urllib下载pdf?

23 投票

6 回答

60670 浏览

提问于 2025-04-18 13:59

我正在尝试使用urllib从一个网站下载一个pdf文件。这是我目前写的代码：

import urllib

def download_file(download_url):
    web_file = urllib.urlopen(download_url)
    local_file = open('some_file.pdf', 'w')
    local_file.write(web_file.read())
    web_file.close()
    local_file.close()

if __name__ == 'main':
    download_file('http://www.example.com/some_file.pdf')

当我运行这段代码时，得到的pdf文件是空的。我哪里做错了呢？

urllib 网络请求 pdf下载

6 个回答

我建议使用以下代码行

import urllib.request
import shutil
url = "link to your website for pdf file to download"
output_file = "local directory://name.pdf"
with urllib.request.urlopen(url) as response, open(output_file, 'wb') as out_file:
     shutil.copyfileobj(response, out_file)

回答于 2025-04-18 由 Python大师

分享举报

他们尝试了上面的代码，在某些情况下运行得很好，但对于一些嵌入了PDF的网站，你可能会遇到一个错误，像HTTPError: HTTP Error 403: Forbidden。这些网站有一些服务器安全功能，会阻止已知的机器人。在urllib的情况下，它会使用一个头信息，内容类似于 ====> python urllib/3.3.0。所以我建议在urllib的请求模块中添加一个自定义的头信息，下面是示例代码。

from urllib.request import Request, urlopen 
import requests  
url="https://realpython.com/python-tricks-sample-pdf"  
import urllib.request  
req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})  
r = requests.get(url)

with open("<location to dump pdf>/<name of file>.pdf", "wb") as code:
    code.write(r.content)

回答于 2025-04-18 由 Python大师

分享举报

试着使用 urllib.retrieve（Python 3），然后就这样做：

from urllib.request import urlretrieve

def download_file(download_url):
    urlretrieve(download_url, 'path_to_save_plus_some_file.pdf')

if __name__ == 'main':
    download_file('http://www.example.com/some_file.pdf')

回答于 2025-04-18 由 Python大师

分享举报

把 open('some_file.pdf', 'w') 改成 open('some_file.pdf', 'wb')。因为PDF文件是二进制文件，所以你需要加个'b'。其实，任何你不能在文本编辑器中打开的文件，基本上都需要这样做。

回答于 2025-04-18 由 Python大师

分享举报

这里有一个可以运行的例子：

import urllib2

def main():
    download_file("http://mensenhandel.nl/files/pdftest2.pdf")

def download_file(download_url):
    response = urllib2.urlopen(download_url)
    file = open("document.pdf", 'wb')
    file.write(response.read())
    file.close()
    print("Completed")

if __name__ == "__main__":
    main()

回答于 2025-04-18 由 Python大师

分享举报

使用urllib下载pdf?

6 个回答

撰写回答