分块读取大文件,压缩并分块写入
我遇到了一个问题,因为文件太大,处理起来很麻烦。这些文件的大小在逐渐增加,未来还会继续增大。由于我上传压缩文件的第三方应用有一些限制,我只能使用deflate这种压缩方式。
运行这个脚本的服务器内存有限,所以常常会出现内存不足的问题。因此,我尝试分块读取和写入数据,最终输出需要的压缩文件。
到目前为止,我一直在使用这段代码来压缩文件,以减少文件大小,一直以来都运行得很好,但现在文件太大,无法处理或压缩了。
with open(file_path_partial, 'rb') as file_upload, open(file_path, 'wb') as file_compressed:
file_compressed.write(zlib.compress(file_upload.read()))
我尝试过一些不同的方法来解决这个问题,但到目前为止都没有成功。
1)
with open(file_path_partial, 'rb') as file_upload:
with open(file_path, 'wb') as file_compressed:
with gzip.GzipFile(file_path_partial, 'wb', fileobj=file_compressed) as file_compressed:
shutil.copyfileobj(file_upload, file_compressed)
2)
BLOCK_SIZE = 64
compressor = zlib.compressobj(1)
filename = file_path_partial
with open(filename, 'rb') as input:
with open(file_path, 'wb') as file_compressed:
while True:
block = input.read(BLOCK_SIZE)
if not block:
break
file_compressed.write(compressor.compress(block))
相关问题:
- 暂无相关问题
1 个回答
2
下面这个例子是每次读取64千字节的数据块,修改每个数据块,然后把它写入一个gzip格式的文件里。
这就是你想要的吗?
import gzip
with open("test.txt", "rb") as fin, gzip.GzipFile("modified.txt.gz", "w") as fout:
while True:
block = fin.read(65536) # read in 64k blocks
if not block:
break
# comment next line to just write through
block = block.replace(b"a", b"A")
fout.write(block)