Python. 如何从JSON获取目录路径?

2 投票
3 回答
4938 浏览
提问于 2025-04-18 17:21

我从API那里收到了一个json格式的回答:

    "files":[
    {
      "name":"main",
      "node_type":"directory",
      "files":[
        {
          "name":"source1",
          "node_type":"directory",
          "files":[
            {
              "name":"letters",
              "node_type":"directory",
              "files":[
                {
                  "name":"messages.po",
                  "node_type":"file",
                  "created":"2014-08-14 08:51:41",
                  "last_updated":"2014-08-14 08:51:42",
                  "last_accessed":"0000-00-00 00:00:00"
                }
              ]
            }
          ]
        },
        {
          "name":"source2",
          "node_type":"directory",
          "files":[

          ]
        }
      ]
    },
    {
      "name":"New Directory",
      "node_type":"directory",
      "files":[
        {
          "name":"prefs.js",
          "node_type":"file",
          "created":"2014-08-14 08:11:53",
          "last_updated":"2014-08-14 08:11:53",
          "last_accessed":"0000-00-00 00:00:00"
        }
      ]
    },
    {
      "name":"111",
      "node_type":"directory",
      "files":[
        {
          "name":"222",
          "node_type":"directory",
          "files":[
            {
              "name":"333",
              "node_type":"directory",
              "files":[
                {
                  "name":"cli.mo",
                  "node_type":"file",
                  "created":"2014-08-14 08:51:30",
                  "last_updated":"2014-08-14 08:51:30",
                  "last_accessed":"0000-00-00 00:00:00"
                }
              ]
            }
          ]
        }
      ]
    }
  ],

项目的结构是:

├──111──222──333───cli.mo
├──main──source1──letters───messages.po
         └──source2
├──New Directory──prefs.js

我该如何解析这个json,以便能得到像这样的返回结果:

/111/222/333/cli.mo
/main/source1/letters/messages.po
/main/source2/
/New Directory/prefs.js

我尝试用Python写了一些代码,但我还是个初学者,结果都失败了。

3 个回答

0

我觉得处理这个问题的最好方法和Unix里的ls -R命令,以及Python中的os.walk()函数是一样的:都采用递归的方式。举个例子,如果你想列出所有的文件,包括文件夹,你可以这样做:

def walk(tree, path):
  dirs = []
  for f in tree:
    print(path + '/' + f['name'])
    if f['node_type']=='directory':
      dirs.append(f['files'])

  for subtree in dirs:
    walk(subtree, path+'/'+f['name'])
1

你需要的是一种叫做递归下降解析器的东西。json模块可以帮你处理很多解析JSON语法的繁重工作,但你还是需要去遍历解析后的数据结构,并理解它。之所以用递归,是因为你不知道会遇到多少层或级别的目录结构。

jdata = """
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42", "name": "messages.po",
"created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory", "name": "source1"}, {"files": [], "node_type":
"directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created": "2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [
{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}], "node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
"""

import json
import os
import sys

if sys.version_info[0] > 2:
    unicode = str

class Filepaths(object):

    def __init__(self, data):
        """
        Discover file paths in the given data. If the data is JSON string,
        decode it. If already decoded into Python structures, use it directly.
        """
        self.paths = []
        if isinstance(data, (str, unicode)):
            data = json.loads(data)
        self.traverse(data)
        self.paths = reversed(self.paths)

    def traverse(self, n, prefix="/"):
        """
        Traverse the data tree. On terminal nodes, add files and directories
        found to self.paths
        """
        if isinstance(n, list):
            for item in n:
                self.traverse(item, prefix)
        elif isinstance(n, dict):
            nodetype = n['node_type']
            nodename = n['name']
            if nodetype == 'directory':
                files = n['files']
                if files:
                    for f in files:
                        self.traverse(f, os.path.join(prefix, nodename))
                else:
                    self.paths.append(os.path.join(prefix, nodename) + '/')
            elif nodetype == 'file':
                self.paths.append(os.path.join(prefix, nodename))
            else:
                raise ValueError("didn't understand node named {0!r}, type {1!r}".format(nodename, nodetype))
        else:
            raise ValueError("didn't understand node {0!r}".format(n))

p = Filepaths(jdata)
for path in p.paths:
    print path

这样做的结果是:

/111/222/333/cli.mo
/New Directory/prefs.js
/main/source2/
/main/source1/letters/messages.po

注意,我使用了一个类,而不是单纯的递归函数,这样可以避免Python对全局变量的严格规定。没错,我可以声明一个全局变量paths,并在函数里标记为global,但那样会显得很麻烦。对象是Python中“打包”例程和它们需要访问的数据的标准方式。在Python中,递归遍历通常用对象来处理效果更好。

3

如果你想要真正接收到字符串,我建议使用生成器:

def parse(data, parent=''):
    if data is None or not len(data):
        yield parent
    else:
        for node in data:
            for result in parse(
                    node.get('files'), parent + '/' + node.get('name')):
                yield result

你还可以使用一种变体的 yield parent 语句,这样 /main/source2 就会以带斜杠的形式返回(/main/source2/),不过我觉得这样写有点啰嗦:

        yield parent + ('/' if data is not None and not len(data) else '')

把你用 JSON 解析过的列表传给上面的 parse 函数,你就会得到一个迭代器,它会给你提供在数据中找到的字符串:

import json

# shamelessly ignoring PEP8 for the sake of space
data = '''
[{"files": [{"files": [{"files": [{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:42",
"name": "messages.po", "created": "2014-08-14 08:51:41"}], "node_type": "directory", "name": "letters"}], "node_type": "directory",
"name": "source1"}, {"files": [], "node_type": "directory", "name": "source2"}], "node_type": "directory", "name": "main"}, {"files":
[{"node_type": "file", "last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:11:53", "name": "prefs.js", "created":
"2014-08-14 08:11:53"}], "node_type": "directory", "name": "New Directory"}, {"files": [{"files": [{"files": [{"node_type": "file",
"last_accessed": "0000-00-00 00:00:00", "last_updated": "2014-08-14 08:51:30", "name": "cli.mo", "created": "2014-08-14 08:51:30"}],
"node_type": "directory", "name": "333"}], "node_type": "directory", "name": "222"}], "node_type": "directory", "name": "111"}]
'''

for item in parse(json.loads(data)):
    print item

运行上面的代码会给你输出:

/main/source1/letters/messages.po
/main/source2
/New Directory/prefs.js
/111/222/333/cli.mo

这里有一篇关于生成器的有趣文章,可以在 SO 上找到: Python 中的 "yield" 关键字有什么用? - 我建议你看看所有的回答。

撰写回答