用通配符解压Python脚本中的文件

2024-04-20 09:50:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试在Python脚本中导入焦油gz从HDFS文件,然后解压它。文件如下所示20160822073413-EoRcGvXMDIB5SVenEyD4pOEADPVPhPsg.tar.gz,它的结构始终相同。在

在python脚本中,我希望在本地复制它并提取文件。我使用以下命令执行此操作:

import subprocess
import os
import datetime
import time

today = time.strftime("%Y%m%d")

#Copy tar file from HDFS to local server
args = ["hadoop","fs","-copyToLocal", "/locationfile/" + today + "*"]

p=subprocess.Popen(args)

p.wait()

#Untar the CSV file 
args = ["tar","-xzvf",today + "*"]

p=subprocess.Popen(args)

p.wait()

导入工作正常,但无法提取文件,出现以下错误:

^{pr2}$

有人能帮我吗?在

非常感谢!在


Tags: 文件import脚本todaytimeargstarhdfs
3条回答

我找到了一种方法来完成我所需要的,而不是使用os命令,而是使用pythontar命令,它很有效!在

import tarfile
import glob

os.chdir("/folder_to_scan/")
for file in glob.glob("*.tar.gz"):
    print(file)

tar = tarfile.open(file)
tar.extractall()

希望这能有所帮助。在

问候 马吉德

尝试使用shell选项:

p=subprocess.Popen(args, shell=True)

来自the docs

If shell is True, the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of ~ to a user’s home directory.

注意:

However, note that Python itself offers implementations of many shell-like features (in particular, glob, fnmatch, os.walk(), os.path.expandvars(), os.path.expanduser(), and shutil).

除了@martriay答案之外,您还出现了一个错误-您写的是“20160822*.tar”,而文件的模式是“20160822*”。焦油gz““

当应用shell=True时,该命令应该作为一个完整的字符串传递(请参见documentation),如下所示:

p=subprocess.Popen('tar -xzvf 20160822*.tar.gz', shell=True)

如果不需要p,只需使用subprocess.call

^{pr2}$

但是我建议您使用更多的标准库,例如:

import glob
import tarfile

today = "20160822"  # compute your common prefix here
target_dir = "/tmp"  # choose where ever you want to extract the content

for targz_file in glob.glob('%s*.tar.gz' % today):
    with tarfile.open(targz_file, 'r:gz') as opened_targz_file:
        opened_targz_file.extractall(target_dir)

相关问题 更多 >