Python递归读取文件夹

334 投票
17 回答
461254 浏览
提问于 2025-04-15 18:57

我之前学过C++和Obj-C,现在刚开始接触Python(写了大约一个小时)。

我正在写一个脚本,用来递归地读取一个文件夹结构中的文本文件内容。

我遇到的问题是,我写的代码只能处理一层文件夹。我能明白为什么会这样(见#hardcoded path),但由于我对Python的经验还很浅薄,不知道该如何继续前进。

Python代码:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):

    for folder in subFolders:
        outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName

        for file in files:
            filePath = rootdir + '/' + file
            f = open( filePath, 'r' )
            toWrite = f.read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
            f.close()

        folderOut.close()

17 个回答

45

我同意Dave Webb的看法,os.walk会为树形结构中的每个目录生成一个项目。实际上,你根本不需要关心subFolders

像这样的代码应该可以正常工作:

import os
import sys

rootdir = sys.argv[1]

for folder, subs, files in os.walk(rootdir):
    with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest:
        for filename in files:
            with open(os.path.join(folder, filename), 'r') as src:
                dest.write(src.read())
275

如果你使用的是Python 3.5或更高版本,你可以用一行代码就搞定这个。

import glob

# root_dir needs a trailing slash (i.e. /root/dir/)
for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
     print(filename)

正如在文档中提到的那样,

如果设置了递归为真,模式'**'会匹配任何文件以及零个或多个目录和子目录。

如果你想要获取所有文件,可以使用

import glob

for filename in glob.iglob(root_dir + '**/**', recursive=True):
     print(filename)
467

确保你理解 os.walk 的三个返回值:

for root, subdirs, files in os.walk(rootdir):

它们的意思如下:

  • root:当前正在“遍历”的路径
  • subdirs:在 root 目录下的子目录
  • files:在 root 目录下的文件(不包括在 subdirs 中的文件),这些文件不是目录

另外,请使用 os.path.join 来连接路径,而不是用斜杠!你的问题在于 filePath = rootdir + '/' + file - 你应该连接当前“遍历”的文件夹,而不是最上层的文件夹。所以应该是 filePath = os.path.join(root, file)。顺便提一下,“file”是一个内置名称,所以通常不建议用它作为变量名。

还有一个问题是你的循环,应该像这样写,例如:

import os
import sys

walk_dir = sys.argv[1]

print('walk_dir = ' + walk_dir)

# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))

for root, subdirs, files in os.walk(walk_dir):
    print('--\nroot = ' + root)
    list_file_path = os.path.join(root, 'my-directory-list.txt')
    print('list_file_path = ' + list_file_path)

    with open(list_file_path, 'wb') as list_file:
        for subdir in subdirs:
            print('\t- subdirectory ' + subdir)

        for filename in files:
            file_path = os.path.join(root, filename)

            print('\t- file %s (full path: %s)' % (filename, file_path))

            with open(file_path, 'rb') as f:
                f_content = f.read()
                list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
                list_file.write(f_content)
                list_file.write(b'\n')

如果你不知道,with 语句用于文件操作是一种简写方式:

with open('filename', 'rb') as f:
    dosomething()

# is effectively the same as

f = open('filename', 'rb')
try:
    dosomething()
finally:
    f.close()

撰写回答