Python走路,但线很轻

2024-03-28 20:11:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我希望递归地遍历一个目录,但是我希望python在遇到一个包含超过100个文件的目录时从任何一个listdir中断。基本上,我搜索的是一个(.TXT)文件,但我希望避免使用大DPX图像序列的目录(通常为10000个文件)。由于dpx本身位于没有子目录的目录中,所以我希望尽快打破这个循环。在

长话短说,如果python遇到与“.DPX$”匹配的文件,它将停止列出子目录,退出,跳过该子目录,并继续在其他子目录中漫游。在

在返回所有列表结果之前是否可以中断目录列表循环?在


Tags: 文件图像目录txt列表序列listdirdpx
3条回答

避免使用操作系统列表目录就像@Charles Duffy说的那样使用操作系统级函数。在

灵感来自另一篇文章:List files in a folder as a stream to begin process immediately

我添加了如何解决具体的操作问题,并使用了可重入版本的函数。在

from ctypes import CDLL, c_char_p, c_int, c_long, c_ushort, c_byte, c_char, Structure, POINTER, byref, cast, sizeof, get_errno
from ctypes.util import find_library

class c_dir(Structure):
    """Opaque type for directory entries, corresponds to struct DIR"""
    pass

class c_dirent(Structure):
    """Directory entry"""
    # FIXME not sure these are the exactly correct types!
    _fields_ = (
        ('d_ino', c_long), # inode number
        ('d_off', c_long), # offset to the next dirent
        ('d_reclen', c_ushort), # length of this record
        ('d_type', c_byte), # type of file; not supported by all file system types
        ('d_name', c_char * 4096) # filename
        )
c_dirent_p = POINTER(c_dirent)
c_dirent_pp = POINTER(c_dirent_p)
c_dir_p = POINTER(c_dir)

c_lib = CDLL(find_library("c"))
opendir = c_lib.opendir
opendir.argtypes = [c_char_p]
opendir.restype = c_dir_p

readdir_r = c_lib.readdir_r
readdir_r.argtypes = [c_dir_p, c_dirent_p, c_dirent_pp]
readdir_r.restype = c_int

closedir = c_lib.closedir
closedir.argtypes = [c_dir_p]
closedir.restype = c_int

import errno

def listdirx(path):
    """
    A generator to return the names of files in the directory passed in
    """
    dir_p = opendir(path)

    if not dir_p:
        raise IOError()

    entry_p = cast(c_lib.malloc(sizeof(c_dirent)), c_dirent_p)

    try:
        while True:
            res = readdir_r(dir_p, entry_p, byref(entry_p))
            if res:
                raise IOError()
            if not entry_p:
                break
            name = entry_p.contents.d_name
            if name not in (".", ".."):
                yield name
    finally:
        if dir_p:
            closedir(dir_p)
        if entry_p:
            c_lib.free(entry_p)

if __name__ == '__main__':
    import sys
    path = sys.argv[1]
    max_per_dir = int(sys.argv[2])
    for idx, entry in enumerate(listdirx(path)):
        if idx >= max_per_dir:
            break
        print entry

如果你所说的“目录列表循环”是指os.listdir(),那么就不是了。这是无法中断的。但是,您可以查看^{}^{}方法,只删除包含DPX文件的所有目录。如果使用os.walk()并自上而下遍历,那么只需修改目录列表,就可以影响Python进入的目录。os.path.walk()允许您使用visit方法选择行走的位置。在

根据os.walkdocumentation

When topdown is True, the caller can modify the dirnames list in-place (e.g., via del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, or to impose a specific order of visiting. Modifying dirnames when topdown is False is ineffective, since the directories in dirnames have already been generated by the time dirnames itself is generated.

所以理论上,如果清空dirnames,那么{}将不会递归到任何其他目录。注意关于“…via del or slice assignment”的注释,您不能简单地执行dirnames=[],因为这实际上不会影响dirnames列表的内容。在

相关问题 更多 >