Python: 下载大文件时出现不可预测的内存错误

5 投票

1 回答

3226 浏览

数据工程师

提问于 2025-04-16 14:46

我写了一个Python脚本，用来从一个HTTP服务器下载很多视频文件（每个文件大小在50到400MB之间）。到目前为止，这个脚本在处理长长的下载列表时表现得很好，但有时候却会出现内存错误。

我的机器大约有1GB的内存是空闲的，但我觉得在运行这个脚本的时候，内存从来没有用满过。

我在任务管理器和性能监控工具中监控了内存使用情况，发现它的表现总是一样：下载过程中内存慢慢增加，下载完成后又恢复到正常水平（没有什么小的内存泄漏之类的问题）。

下载的过程是这样的：它先创建一个文件，这个文件在下载完成之前一直是0KB（或者程序崩溃），然后在下载完成后一次性写入整个文件并关闭它。

for i in range(len(urls)):
    if os.path.exists(folderName + '/' + filenames[i] + '.mov'):
        print 'File exists, continuing.'
        continue

    # Request the download page
    req = urllib2.Request(urls[i], headers = headers)

    sock = urllib2.urlopen(req)
    responseHeaders = sock.headers
    body = sock.read()
    sock.close()

    # Search the page for the download URL
    tmp = body.find('/getfile/')
    downloadSuffix = body[tmp:body.find('"', tmp)]
    downloadUrl = domain + downloadSuffix

    req = urllib2.Request(downloadUrl, headers = headers)

    print '%s Downloading %s, file %i of %i'
        % (time.ctime(), filenames[i], i+1, len(urls))

    f = urllib2.urlopen(req)

    # Open our local file for writing, 'b' for binary file mode
    video_file = open(foldername + '/' + filenames[i] + '.mov', 'wb')

    # Write the downloaded data to the local file
    video_file.write(f.read()) ##### MemoryError: out of memory #####
    video_file.close()

    print '%s Download complete!' % (time.ctime())

    # Free up memory, in hopes of preventing memory errors
    del f
    del video_file

这里是堆栈跟踪信息：

  File "downloadVideos.py", line 159, in <module>
    main()
  File "downloadVideos.py", line 136, in main
    video_file.write(f.read())
  File "c:\python27\lib\socket.py", line 358, in read
    buf.write(data)
MemoryError: out of memory

内存管理 http请求堆栈跟踪性能监控文件写入内存错误下载优化大文件下载

1 个回答

你的问题出在这里：f.read()。这一行代码试图把整个文件一次性加载到内存里。这样做可能会占用太多内存。相反，你可以分块读取文件，比如用chunk = f.read(4096)，这样每次只读取4096个字节，然后把这些小块保存到临时文件中。

回答于 2025-04-16 由 Python大师

分享举报

Python: 下载大文件时出现不可预测的内存错误

1 个回答

撰写回答