Python：递归创建字典从路径

12 投票

5 回答

18114 浏览

提问于 2025-04-15 17:43

我有好几万个网址，我想为它们生成一些统计数据。比如说，我有：

/a/b/c
/a/b/d
/a/c/d
/b/c/d
/b/d/e
/a/b/c
/b/c/d

我想创建一个像这样的字典：

{
    'a': {
        'b': {
            'c': {
                '_count': 2
            },
            'd': {
                '_count': 1
            }
        },
        'c': {
            'd': {
                '_count': 1
            }
        }
    },
    'b': {
        'c': {
            'd': {
                '_count': 2
            }
        },
        'd': {
            'e': {
                '_count': 1
            }
        }
    }
}

有没有什么聪明的方法可以做到这一点？

补充说明

我还得提一下，这些网址的路径不一定都是三部分的。可能会有像这样 /a/b/c/d/e/f/g/h 的情况……等等等等。

数据结构字典生成递归算法数据统计路径解析

5 个回答

虽然这个答案有点旧，但在谷歌上还是排在前面，所以我来更新一下：你可以用dpath-python来实现这个。

$ easy_install dpath
>>> result = {}
>>> for path in my_list_of_paths:
>>> ... dpath.util.set(result, path, SOME_VALUE)

... 就这样。我不太明白你用来预先计算那些终点值（比如1、2等）的数学方法，不过你可以提前计算好，然后用一个路径到值的字典，而不是直接用一个简单的列表。

>>> x = {'path/name': 0, 'other/path/name': 1}
>>> for (path, value) in x.iteritems():
>>> ... dpath.util.set(result, path, value)

类似这样的做法是可行的。

回答于 2025-04-15 由 Python大师

分享举报

编辑：

我已经根据你最后的评论修改了我的代码，现在没有复杂的数据结构了。

def dictizeString(string, dictionary):
    while string.startswith('/'):
        string = string[1:]
    parts = string.split('/', 1)
    if len(parts) > 1:
        branch = dictionary.setdefault(parts[0], {})
        dictizeString(parts[1], branch)
    else:
        if dictionary.has_key(parts[0]):
             # If there's an addition error here, it's because invalid data was added
             dictionary[parts[0]] += 1
        else:
             dictionary[parts[0]] = 1

它将为每个项目存储一个 [频率, 字典] 的列表。

测试案例

>>> d = {}
>>> dictizeString('/a/b/c/d', d)
>>> dictizeString('/a/b/c/d', d)
>>> dictizeString('/a/b/c/d', d)
>>> dictizeString('/a/b/c/d', d)
>>> dictizeString('/a/b/e', d)
>>> dictizeString('/c', d)
>>> d
{'a': {'b': {'c': {'d': 4}, 'e': 1}}, 'c': 1}

回答于 2025-04-15 由 Python大师

分享举报

如果你的路径都像你给的例子那样，这段代码就能用：

counts = {}
for p in paths:
   parts = p.split('/')
   branch = counts
   for part in parts[1:-1]:
      branch = branch.setdefault(part, {})
   branch[parts[-1]] = 1 + branch.get(parts[-1], 0)

这段代码使用了字典的一些方法，比如setdefault()和get()，这样就不用写很多的if语句了。

需要注意的是，如果一个有子目录的路径也可以单独出现，那么这个方法就不适用了。因为这样就不清楚counts中对应的部分应该是一个数字还是另一个字典。在这种情况下，最好是为每个节点同时存储一个计数和一个字典，可以使用元组或者自定义类来实现。

基本的算法还是一样的：

class Stats(object):
   def __init__(self):
      self.count = 0
      self.subdirs = {}

counts = Stats()
for p in paths:
   parts = p.split('/')
   branch = counts
   for part in parts[1:]:
      branch = branch.subdirs.setdefault(part, Stats())
   branch.count += 1

经过一些美化后，你会得到：

def printstats(stats, indent=''):
   print indent + str(stats.count) + ' times'
   for (d, s) in stats.subdirs.items():
       print indent + d + ':'
       printstats(s, indent + '  ')

>>> printstats(counts)
0 times
a:
  0 times
  c:
    0 times
    d:
      1 times
  b:
    0 times
    c:
      2 times
    d:
      1 times
...

回答于 2025-04-15 由 Python大师

分享举报

Python：递归创建字典从路径

5 个回答

撰写回答