Python递归文件夹读取

2024-05-29 03:56:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个C++ /Obj-C背景,我只是在发现Python(写了大约一个小时)。 我正在编写一个脚本,以递归方式读取文件夹结构中文本文件的内容。

我遇到的问题是我编写的代码只适用于一个文件夹深度。我可以理解为什么在代码中(参见#hardcoded path),我只是不知道如何使用Python,因为我对它的体验是全新的。

Python代码:

import os
import sys

rootdir = sys.argv[1]

for root, subFolders, files in os.walk(rootdir):

    for folder in subFolders:
        outfileName = rootdir + "/" + folder + "/py-outfile.txt" # hardcoded path
        folderOut = open( outfileName, 'w' )
        print "outfileName is " + outfileName

        for file in files:
            filePath = rootdir + '/' + file
            f = open( filePath, 'r' )
            toWrite = f.read()
            print "Writing '" + toWrite + "' to" + filePath
            folderOut.write( toWrite )
            f.close()

        folderOut.close()

Tags: path代码inimport文件夹forossys
3条回答

如果您使用的是Python3.5或更高版本,您可以在一行中完成这项工作。

import glob

for filename in glob.iglob(root_dir + '**/*.txt', recursive=True):
     print(filename)

documentation所述

If recursive is true, the pattern '**' will match any files and zero or more directories and subdirectories.

如果你想要每个文件,你可以使用

import glob

for filename in glob.iglob(root_dir + '**/*', recursive=True):
     print(filename)

确保您理解os.walk的三个返回值:

for root, subdirs, files in os.walk(rootdir):

具有以下含义:

  • root:当前路径为“已遍历”
  • subdirs:类型目录root中的文件
  • files:目录以外类型的root(不在subdirs)中的文件

请使用os.path.join而不要用斜线连接!您的问题是filePath = rootdir + '/' + file-您必须连接当前“walked”文件夹,而不是最上面的文件夹。所以那一定是filePath = os.path.join(root, file)。顺便说一句,“file”是一个内置的,所以通常不把它用作变量名。

另一个问题是循环,应该是这样的,例如:

import os
import sys

walk_dir = sys.argv[1]

print('walk_dir = ' + walk_dir)

# If your current working directory may change during script execution, it's recommended to
# immediately convert program arguments to an absolute path. Then the variable root below will
# be an absolute path as well. Example:
# walk_dir = os.path.abspath(walk_dir)
print('walk_dir (absolute) = ' + os.path.abspath(walk_dir))

for root, subdirs, files in os.walk(walk_dir):
    print('--\nroot = ' + root)
    list_file_path = os.path.join(root, 'my-directory-list.txt')
    print('list_file_path = ' + list_file_path)

    with open(list_file_path, 'wb') as list_file:
        for subdir in subdirs:
            print('\t- subdirectory ' + subdir)

        for filename in files:
            file_path = os.path.join(root, filename)

            print('\t- file %s (full path: %s)' % (filename, file_path))

            with open(file_path, 'rb') as f:
                f_content = f.read()
                list_file.write(('The file %s contains:\n' % filename).encode('utf-8'))
                list_file.write(f_content)
                list_file.write(b'\n')

如果您不知道,文件的with语句是一个速记:

with open('filename', 'rb') as f:
    dosomething()

# is effectively the same as

f = open('filename', 'rb')
try:
    dosomething()
finally:
    f.close()

同意Dave Webb,os.walk将为树中的每个目录生成一个项。事实上,你不需要关心subFolders

这样的代码应该可以工作:

import os
import sys

rootdir = sys.argv[1]

for folder, subs, files in os.walk(rootdir):
    with open(os.path.join(folder, 'python-outfile.txt'), 'w') as dest:
        for filename in files:
            with open(os.path.join(folder, filename), 'r') as src:
                dest.write(src.read())

相关问题 更多 >

    热门问题