遍历任意深度的嵌套字典(字典表示目录树)
写这段话的时候,我还是个Python新手。
我遇到这个问题是因为我想让用户能够从一个文件夹(以及它的子文件夹)中选择一组文件,但不幸的是,Tkinter在Windows 7上的多文件选择功能有点问题(具体可以看这个链接:http://bugs.python.org/issue8010)。
所以我尝试用另一种方法来表示文件夹结构(还是用Tkinter):构建一个文件夹结构的模拟图,里面有带标签的缩进复选框(以树状形式组织)。比如说,一个这样的文件夹:
\SomeRootDirectory
\foo.txt
\bar.txt
\Stories
\Horror
\scary.txt
\Trash
\notscary.txt
\Cyberpunk
\Poems
\doyoureadme.txt
看起来会像这样(#代表一个复选框):
SomeRootDirectory
# foo.txt
# bar.txt
Stories
Horror
# scary.txt
Trash
# notscary.txt
Cyberpunk
Poems
# doyoureadme.txt
从文件夹结构构建原始字典其实很简单,我在ActiveState找到了一个方法(见下文),但当我尝试遍历这个漂亮的嵌套字典时,我就遇到了困难。
5 个回答
我知道这个问题已经很老了,但我只是想找一种简单、干净的方法来遍历嵌套的字典,而这是我有限搜索中找到的最接近的答案。oadams的回答如果你想要的不仅仅是文件名,那就不太有用,而spicavigo的回答看起来又有点复杂。
最后我自己写了一个,功能类似于os.walk处理目录的方式,不过它返回的是所有的键/值信息。
这个方法返回一个迭代器,对于“树状结构”中的每个目录,迭代器会返回(路径,子字典,值),其中:
- 路径是指向字典的路径
- 子字典是一个包含(键,字典)对的元组,表示这个字典中的每个子字典
- 值是一个包含(键,值)对的元组,表示这个字典中每个(非字典)项的键和值
def walk(d):
'''
Walk a tree (nested dicts).
For each 'path', or dict, in the tree, returns a 3-tuple containing:
(path, sub-dicts, values)
where:
* path is the path to the dict
* sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
* values is a tuple of (key,value) pairs for each (non-dict) item in this dict
'''
# nested dict keys
nested_keys = tuple(k for k in d.keys() if isinstance(d[k],dict))
# key/value pairs for non-dicts
items = tuple((k,d[k]) for k in d.keys() if k not in nested_keys)
# return path, key/sub-dict pairs, and key/value pairs
yield ('/', [(k,d[k]) for k in nested_keys], items)
# recurse each subdict
for k in nested_keys:
for res in walk(d[k]):
# for each result, stick key in path and pass on
res = ('/%s' % k + res[0], res[1], res[2])
yield res
这是我用来测试的代码,虽然里面还有一些其他无关但很有意思的内容:
import simplejson as json
from collections import defaultdict
# see https://gist.github.com/2012250
tree = lambda: defaultdict(tree)
def walk(d):
'''
Walk a tree (nested dicts).
For each 'path', or dict, in the tree, returns a 3-tuple containing:
(path, sub-dicts, values)
where:
* path is the path to the dict
* sub-dicts is a tuple of (key,dict) pairs for each sub-dict in this dict
* values is a tuple of (key,value) pairs for each (non-dict) item in this dict
'''
# nested dict keys
nested_keys = tuple(k for k in d.keys() if isinstance(d[k],dict))
# key/value pairs for non-dicts
items = tuple((k,d[k]) for k in d.keys() if k not in nested_keys)
# return path, key/sub-dict pairs, and key/value pairs
yield ('/', [(k,d[k]) for k in nested_keys], items)
# recurse each subdict
for k in nested_keys:
for res in walk(d[k]):
# for each result, stick key in path and pass on
res = ('/%s' % k + res[0], res[1], res[2])
yield res
# use fancy tree to store arbitrary nested paths/values
mem = tree()
root = mem['SomeRootDirectory']
root['foo.txt'] = None
root['bar.txt'] = None
root['Stories']['Horror']['scary.txt'] = None
root['Stories']['Horror']['Trash']['notscary.txt'] = None
root['Stories']['Cyberpunk']
root['Poems']['doyoureadme.txt'] = None
# convert to json string
s = json.dumps(mem, indent=2)
#print mem
print s
print
# json.loads converts to nested dicts, need to walk them
for (path, dicts, items) in walk(json.loads(s)):
# this will print every path
print '[%s]' % path
for key,val in items:
# this will print every key,value pair (skips empty paths)
print '%s = %s' % (path+key,val)
print
输出看起来像这样:
{
"SomeRootDirectory": {
"foo.txt": null,
"Stories": {
"Horror": {
"scary.txt": null,
"Trash": {
"notscary.txt": null
}
},
"Cyberpunk": {}
},
"Poems": {
"doyoureadme.txt": null
},
"bar.txt": null
}
}
[/]
[/SomeRootDirectory/]
/SomeRootDirectory/foo.txt = None
/SomeRootDirectory/bar.txt = None
[/SomeRootDirectory/Stories/]
[/SomeRootDirectory/Stories/Horror/]
/SomeRootDirectory/Stories/Horror/scary.txt = None
[/SomeRootDirectory/Stories/Horror/Trash/]
/SomeRootDirectory/Stories/Horror/Trash/notscary.txt = None
[/SomeRootDirectory/Stories/Cyberpunk/]
[/SomeRootDirectory/Poems/]
/SomeRootDirectory/Poems/doyoureadme.txt = None
这里有一个函数可以打印出你所有的文件名。它会遍历字典中的所有键,如果这些键对应的值不是字典(在你的例子中,就是文件名),那么就会打印出这个名字。如果是字典,它就会对这个字典再调用一次自己。
def print_all_files(directory):
for filename in directory.keys():
if not isinstance(directory[filename], dict):
print filename
else:
print_all_files(directory[filename])
所以这段代码可以根据你的需要进行修改,但它只是一个示例,展示了如何通过递归来避免处理深度的问题。
最重要的是要明白,每次调用print_all_files函数时,它并不知道自己在树的深处有多深。它只关注眼前的文件,并打印出它们的名字。如果遇到文件夹,它就会对这些文件夹再运行一次自己。
这是一个初步的代码。请看看它,告诉我你遇到的问题在哪里。
Parents={-1:"Root"}
def add_dir(level, parent, index, k):
print "Directory"
print "Level=%d, Parent=%s, Index=%d, value=%s" % (level, Parents[parent], index, k)
def add_file(parent, index, k):
print "File"
print "Parent=%s, Index=%d, value=%s" % (Parents[parent], index, k)
def f(level=0, parent=-1, index=0, di={}):
for k in di:
index +=1
if di[k]:
Parents[index]=k
add_dir(level, parent, index, k)
f(level+1, index, index, di[k])
else:
add_file(parent, index, k)
a={
'SomeRootDirectory': {
'foo.txt': None,
'bar.txt': None,
'Stories': {
'Horror': {
'scary.txt' : None,
'Trash' : {
'notscary.txt' : None,
},
},
'Cyberpunk' : None
},
'Poems' : {
'doyoureadme.txt' : None
}
}
}
f(di=a)