使用requests在python中下载大文件

import requests def DownloadFile(url) local_filename = url.split('/')[-1] r = requests.get(url) f = open(local_filename, 'wb') for chunk in r.iter_content(chunk_size=512 * 1024): if chunk: # filter out keep-alive new chunks f.write(chunk) f.close() return

3条回答

网友

1楼 · 编辑于 2024-05-16 02:42:20

使用以下流代码，无论下载文件的大小如何，Python内存使用都会受到限制：

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                if chunk: # filter out keep-alive new chunks
                    f.write(chunk)
                    # f.flush()
    return local_filename

注意，使用iter_content返回的字节数并不完全是chunk_size；它应该是一个通常大得多的随机数，并且在每次迭代中都应该是不同的。

请参阅http://docs.python-requests.org/en/latest/user/advanced/#body-content-workflow以获取进一步的参考。

网友

2楼 · 编辑于 2024-05-16 02:42:20

使用^{}和^{}会容易得多：

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

这将文件流式传输到磁盘而不使用过多的内存，并且代码很简单。

网友

3楼 · 编辑于 2024-05-16 02:42:20

不完全是OP的要求，但是。。。用urllib很容易做到这一点：

from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

或者这样，如果要将其保存到临时文件中：

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

我观察了这个过程：

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

我看到文件在增长，但内存使用量保持在17MB。我遗漏了什么吗？

相关问题更多 >

编程相关推荐

热门问题

热门文章