遍历任意深度的嵌套字典(字典表示目录树)

7 投票
5 回答
12821 浏览
提问于 2025-04-17 04:09

写这段话的时候,我还是个Python新手。

我遇到这个问题是因为我想让用户能够从一个文件夹(以及它的子文件夹)中选择一组文件,但不幸的是,Tkinter在Windows 7上的多文件选择功能有点问题(具体可以看这个链接:http://bugs.python.org/issue8010)。

所以我尝试用另一种方法来表示文件夹结构(还是用Tkinter):构建一个文件夹结构的模拟图,里面有带标签的缩进复选框(以树状形式组织)。比如说,一个这样的文件夹:

\SomeRootDirectory
    \foo.txt
    \bar.txt
    \Stories
        \Horror
            \scary.txt
            \Trash
                \notscary.txt
        \Cyberpunk
    \Poems
        \doyoureadme.txt

看起来会像这样(#代表一个复选框):

SomeRootDirectory
    # foo.txt
    # bar.txt
    Stories
        Horror
            # scary.txt
            Trash
                # notscary.txt
        Cyberpunk
    Poems
        # doyoureadme.txt

从文件夹结构构建原始字典其实很简单,我在ActiveState找到了一个方法(见下文),但当我尝试遍历这个漂亮的嵌套字典时,我就遇到了困难。

5 个回答

2

我知道这个问题已经很老了,但我只是想找一种简单、干净的方法来遍历嵌套的字典,而这是我有限搜索中找到的最接近的答案。oadams的回答如果你想要的不仅仅是文件名,那就不太有用,而spicavigo的回答看起来又有点复杂。

最后我自己写了一个,功能类似于os.walk处理目录的方式,不过它返回的是所有的键/值信息。

这个方法返回一个迭代器,对于“树状结构”中的每个目录,迭代器会返回(路径,子字典,值),其中:

  • 路径是指向字典的路径
  • 子字典是一个包含(键,字典)对的元组,表示这个字典中的每个子字典
  • 值是一个包含(键,值)对的元组,表示这个字典中每个(非字典)项的键和值

def walk(d):
    '''
    Walk a tree (nested dicts).
    
    For each 'path', or dict, in the tree, returns a 3-tuple containing:
    (path, sub-dicts, values)
    
    where:
    * path is the path to the dict
    * sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
    * values is a tuple of (key,value) pairs for each (non-dict) item in this dict
    '''
    # nested dict keys
    nested_keys = tuple(k for k in d.keys() if isinstance(d[k],dict))
    # key/value pairs for non-dicts
    items = tuple((k,d[k]) for k in d.keys() if k not in nested_keys)
    
    # return path, key/sub-dict pairs, and key/value pairs
    yield ('/', [(k,d[k]) for k in nested_keys], items)
    
    # recurse each subdict
    for k in nested_keys:
        for res in walk(d[k]):
            # for each result, stick key in path and pass on
            res = ('/%s' % k + res[0], res[1], res[2])
            yield res

这是我用来测试的代码,虽然里面还有一些其他无关但很有意思的内容:

import simplejson as json
from collections import defaultdict

# see https://gist.github.com/2012250
tree = lambda: defaultdict(tree)

def walk(d):
    '''
    Walk a tree (nested dicts).
    
    For each 'path', or dict, in the tree, returns a 3-tuple containing:
    (path, sub-dicts, values)
    
    where:
    * path is the path to the dict
    * sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
    * values is a tuple of (key,value) pairs for each (non-dict) item in this dict
    '''
    # nested dict keys
    nested_keys = tuple(k for k in d.keys() if isinstance(d[k],dict))
    # key/value pairs for non-dicts
    items = tuple((k,d[k]) for k in d.keys() if k not in nested_keys)
    
    # return path, key/sub-dict pairs, and key/value pairs
    yield ('/', [(k,d[k]) for k in nested_keys], items)
    
    # recurse each subdict
    for k in nested_keys:
        for res in walk(d[k]):
            # for each result, stick key in path and pass on
            res = ('/%s' % k + res[0], res[1], res[2])
            yield res

# use fancy tree to store arbitrary nested paths/values
mem = tree()

root = mem['SomeRootDirectory']
root['foo.txt'] = None
root['bar.txt'] = None
root['Stories']['Horror']['scary.txt'] = None
root['Stories']['Horror']['Trash']['notscary.txt'] = None
root['Stories']['Cyberpunk']
root['Poems']['doyoureadme.txt'] = None

# convert to json string
s = json.dumps(mem, indent=2)

#print mem
print s
print

# json.loads converts to nested dicts, need to walk them
for (path, dicts, items) in walk(json.loads(s)):
    # this will print every path
    print '[%s]' % path
    for key,val in items:
        # this will print every key,value pair (skips empty paths)
        print '%s = %s' % (path+key,val)
    print

输出看起来像这样:

{
  "SomeRootDirectory": {
    "foo.txt": null,
    "Stories": {
      "Horror": {
        "scary.txt": null,
        "Trash": {
          "notscary.txt": null
        }
      },
      "Cyberpunk": {}
    },
    "Poems": {
      "doyoureadme.txt": null
    },
    "bar.txt": null
  }
}

[/]

[/SomeRootDirectory/]
/SomeRootDirectory/foo.txt = None
/SomeRootDirectory/bar.txt = None

[/SomeRootDirectory/Stories/]

[/SomeRootDirectory/Stories/Horror/]
/SomeRootDirectory/Stories/Horror/scary.txt = None

[/SomeRootDirectory/Stories/Horror/Trash/]
/SomeRootDirectory/Stories/Horror/Trash/notscary.txt = None

[/SomeRootDirectory/Stories/Cyberpunk/]

[/SomeRootDirectory/Poems/]
/SomeRootDirectory/Poems/doyoureadme.txt = None
9

这里有一个函数可以打印出你所有的文件名。它会遍历字典中的所有键,如果这些键对应的值不是字典(在你的例子中,就是文件名),那么就会打印出这个名字。如果是字典,它就会对这个字典再调用一次自己。

def print_all_files(directory):

    for filename in directory.keys():
        if not isinstance(directory[filename], dict):
            print filename
        else:
            print_all_files(directory[filename])

所以这段代码可以根据你的需要进行修改,但它只是一个示例,展示了如何通过递归来避免处理深度的问题。

最重要的是要明白,每次调用print_all_files函数时,它并不知道自己在树的深处有多深。它只关注眼前的文件,并打印出它们的名字。如果遇到文件夹,它就会对这些文件夹再运行一次自己。

4

这是一个初步的代码。请看看它,告诉我你遇到的问题在哪里。

Parents={-1:"Root"}
def add_dir(level, parent, index, k):
    print "Directory"
    print "Level=%d, Parent=%s, Index=%d, value=%s" % (level, Parents[parent], index, k)
def add_file(parent, index, k):
    print "File"
    print "Parent=%s, Index=%d, value=%s" %  (Parents[parent], index, k)
def f(level=0, parent=-1, index=0, di={}):
    for k in di:
        index +=1
        if di[k]:
            Parents[index]=k
            add_dir(level, parent, index, k)
            f(level+1, index, index, di[k])
        else:
            add_file(parent, index, k)

a={
    'SomeRootDirectory': {
        'foo.txt': None,
        'bar.txt': None,
        'Stories': {
            'Horror': {
                'scary.txt' : None,
                'Trash' : {
                    'notscary.txt' : None,
                    },
                },
            'Cyberpunk' : None
            },
        'Poems' : {
            'doyoureadme.txt' : None
        }
    }
}

f(di=a)

撰写回答