写入文件并保持文件夹结构

1 投票

1 回答

1389 浏览

提问于 2025-04-18 08:35

我正在写一个脚本，这个脚本会从一个地方读取文件，处理这些数据，然后把结果写到另一个地方。用户在命令行上会用一个 -p 参数来指定一个顶级文件夹，然后脚本会在这个文件夹里查找所有的文件。目前我用的是 glob，读取文件的部分没问题。

不过，我还想让用户指定一个输出文件夹，把处理后的文件写进去，并且希望能保持输入路径的文件夹结构。

for eachFile in glob(args.path + "/*/*.json"): <- this seems dangerous. Better way?
  # do something to the json file

  # output the modified data to its new home
  #outfile = os.path.join(args.output, os.path.dirname(eachFile), eachFile) <- doesn't work
  outfile = os.path.join(args.putout, os.path.dirname(eachFile)[1:], eachFile)

我写的最后一行是我目前做得最好的，但它有个问题，就是假设这个脚本是在 posix 系统上运行的，因为它会把目录前面的 "/" 去掉。比如说，如果我输入的路径是 ~/Documents/2014，输出路径是 /tmp，那么文件会被写到 /tmp/Users/myusername/Documents/2014/blah/whatever.json。

这看起来是个很常见的需求，所以我很惊讶我没有找到其他人也需要做这个，或者没有简单的模块可以轻松实现这个功能。有什么建议吗？

命令行参数文件路径数据处理脚本编写文件处理文件夹结构 posix系统 glob模块

1 个回答

这里有一个脚本，差不多可以满足你的需求。关键在于，你需要用os.walk而不是glob，因为你想要深入到文件夹的结构中。你还需要添加一些合理性检查，但这已经是个不错的开始了。

# Recurse and process files.
import os
import sys
from fnmatch import fnmatch
import shutil


def process(src_dir, dst_dir, pattern='*'):
    """Iterate through src_dir, processing all files that match pattern and
    store them, including their parent directories in dst_dir.
    """
    assert src_dir != dst_dir, 'Source and destination dir must differ.'
    for dirpath, dirnames, filenames in os.walk(src_dir):
        # Filter out files that match pattern only.
        filenames = filter(lambda fname: fnmatch(fname, pattern), filenames)

        if filenames:
            dir_ = os.path.join(dst_dir, dirpath)
            os.makedirs(dir_)
            for fname in filenames:
                in_fname = os.path.join(dirpath, fname)
                out_fname = os.path.join(dir_, fname)

                # At this point, the destination directory is created and you
                # have a valid input / output filename, so you'd call your
                # function to process these files.  I just copy them :D
                shutil.copyfile(in_fname, out_fname)

if __name__ == '__main__':
    process(sys.argv[1], sys.argv[2], '*.txt')

回答于 2025-04-18 由 Python大师

分享举报

写入文件并保持文件夹结构

1 个回答

撰写回答