Python. 如何从JSON获取目录路径?
我从API那里收到了一个json格式的回答:
"files":[
{
"name":"main",
"node_type":"directory",
"files":[
{
"name":"source1",
"node_type":"directory",
"files":[
{
"name":"letters",
"node_type":"directory",
"files":[
{
"name":"messages.po",
"node_type":"file",
"created":"2014-08-14 08:51:41",
"last_updated":"2014-08-14 08:51:42",
"last_accessed":"0000-00-00 00:00:00"
}
]
}
]
},
{
"name":"source2",
"node_type":"directory",
"files":[
]
}
]
},
{
"name":"New Directory",
"node_type":"directory",
"files":[
{
"name":"prefs.js",
"node_type":"file",
"created":"2014-08-14 08:11:53",
"last_updated":"2014-08-14 08:11:53",
"last_accessed":"0000-00-00 00:00:00"
}
]
},
{
"name":"111",
"node_type":"directory",
"files":[
{
"name":"222",
"node_type":"directory",
"files":[
{
"name":"333",
"node_type":"directory",
"files":[
{
"name":"cli.mo",
"node_type":"file",
"created":"2014-08-14 08:51:30",
"last_updated":"2014-08-14 08:51:30",
"last_accessed":"0000-00-00 00:00:00"
}
]
}
]
}
]
}
],
项目的结构是:
├──111──222──333───cli.mo
├──main──source1──letters───messages.po
└──source2
├──New Directory──prefs.js
我该如何解析这个json,以便能得到像这样的返回结果:
/111/222/333/cli.mo
/main/source1/letters/messages.po
/main/source2/
/New Directory/prefs.js
我尝试用Python写了一些代码,但我还是个初学者,结果都失败了。
3 个回答
0
我觉得处理这个问题的最好方法和Unix里的ls -R
命令,以及Python中的os.walk()
函数是一样的:都采用递归的方式。举个例子,如果你想列出所有的文件,包括文件夹,你可以这样做:
def walk(tree, path):
dirs = []
for f in tree:
print(path + '/' + f['name'])
if f['node_type']=='directory':
dirs.append(f['files'])
for subtree in dirs:
walk(subtree, path+'/'+f['name'])
1
你需要的是一种叫做递归下降解析器的东西。json
模块可以帮你处理很多解析JSON语法的繁重工作,但你还是需要去遍历解析后的数据结构,并理解它。之所以用递归,是因为你不知道会遇到多少层或级别的目录结构。
jdata = """
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42", "name": "messages.po",
"created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory", "name": "source1"}, {"files": [], "node_type":
"directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created": "2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [
{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}], "node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
"""
import json
import os
import sys
if sys.version_info[0] > 2:
unicode = str
class Filepaths(object):
def __init__(self, data):
"""
Discover file paths in the given data. If the data is JSON string,
decode it. If already decoded into Python structures, use it directly.
"""
self.paths = []
if isinstance(data, (str, unicode)):
data = json.loads(data)
self.traverse(data)
self.paths = reversed(self.paths)
def traverse(self, n, prefix="/"):
"""
Traverse the data tree. On terminal nodes, add files and directories
found to self.paths
"""
if isinstance(n, list):
for item in n:
self.traverse(item, prefix)
elif isinstance(n, dict):
nodetype = n['node_type']
nodename = n['name']
if nodetype == 'directory':
files = n['files']
if files:
for f in files:
self.traverse(f, os.path.join(prefix, nodename))
else:
self.paths.append(os.path.join(prefix, nodename) + '/')
elif nodetype == 'file':
self.paths.append(os.path.join(prefix, nodename))
else:
raise ValueError("didn't understand node named {0!r}, type {1!r}".format(nodename, nodetype))
else:
raise ValueError("didn't understand node {0!r}".format(n))
p = Filepaths(jdata)
for path in p.paths:
print path
这样做的结果是:
/111/222/333/cli.mo
/New Directory/prefs.js
/main/source2/
/main/source1/letters/messages.po
注意,我使用了一个类,而不是单纯的递归函数,这样可以避免Python对全局变量的严格规定。没错,我可以声明一个全局变量paths
,并在函数里标记为global
,但那样会显得很麻烦。对象是Python中“打包”例程和它们需要访问的数据的标准方式。在Python中,递归遍历通常用对象来处理效果更好。
3
如果你想要真正接收到字符串,我建议使用生成器:
def parse(data, parent=''):
if data is None or not len(data):
yield parent
else:
for node in data:
for result in parse(
node.get('files'), parent + '/' + node.get('name')):
yield result
你还可以使用一种变体的 yield parent
语句,这样 /main/source2
就会以带斜杠的形式返回(/main/source2/
),不过我觉得这样写有点啰嗦:
yield parent + ('/' if data is not None and not len(data) else '')
把你用 JSON 解析过的列表传给上面的 parse
函数,你就会得到一个迭代器,它会给你提供在数据中找到的字符串:
import json
# shamelessly ignoring PEP8 for the sake of space
data = '''
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42",
"name": "messages.po", "created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory",
"name": "source1"}, {"files": [], "node_type": "directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files":
[{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created":
"2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [{"files": [{"node_type": "file",
"last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}],
"node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
'''
for item in parse(json.loads(data)):
print item
运行上面的代码会给你输出:
/main/source1/letters/messages.po
/main/source2
/New Directory/prefs.js
/111/222/333/cli.mo
这里有一篇关于生成器的有趣文章,可以在 SO 上找到: Python 中的 "yield" 关键字有什么用? - 我建议你看看所有的回答。