用Python迭代解压文件数

2024-04-19 14:09:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我有2 TB的数据,我必须解压文件做一些分析。但是,由于硬盘空间的问题,我不能一次解压缩所有的文件。我想的是先解压前两千个,然后做我的分析,在接下来的2000年重复。我怎么能做到?你知道吗

import os, glob
import zipfile


root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'

#x = os.listdir(path)[:2000]
for folder in glob.glob(root):
    if folder.endswith(extension): # check for ".zip" extension
        try:
            print(folder)
            os.chdir(to_save)
            zipfile.ZipFile(os.path.join(directory, folder)).extractall(os.path.join(directory, os.path.splitext(folder)[0]))

        except:
            pass

Tags: 文件topathimportforossaveextension
1条回答
网友
1楼 · 发布于 2024-04-19 14:09:26

那怎么办?地址:

import os
import glob
import zipfile

root = 'C:\\Users\\X\\*'
directory = 'C:\\Users\\X'
extension = ".zip"
to_save = 'C:\\Users\\X\\to_save'

# list comp of all '.zip' folders
folders = [folder for folder in glob.glob(root) if folder.endswith(extension)]

# only executes while there are folders remaining to be processed
while folders:
    # only grabs the next 2000 folders if there are at least that many
    if len(folders) >= 2000:
        temp = folders[:2000]
    # otherwise gets all the remaining (i.e. 1152 were left)
    else:
        temp = folders[:]

    # list comp that rebuilds with elements not pulled into 'temp'
    folders = [folder for folder in folders if folder not in temp]

    # this was all your code, I just swapped 'x' in place of 'folder'
    for x in temp:
        try:
            print(x)
            os.chdir(to_save)
            zipfile.ZipFile(os.path.join(directory, x)).extractall(os.path.join(directory, os.path.splitext(x)[0]))
        except:
            pass

这将生成一个.zip的临时列表,然后从原始列表中删除这些元素。唯一的缺点是folders会被修改,所以如果您需要在其他地方使用它,它最终将是空的。你知道吗

相关问题 更多 >