如何`pd.concat`嵌套字典?

0 投票
1 回答
47 浏览
提问于 2025-04-14 15:21

我正在尝试用 pd.concat 来合并多个数据框(DataFrame)。基本上,我是想按照这篇帖子的说明,处理一个四层嵌套字典。

这是我用一个简单例子尝试的结果 -

import pandas as pd
nested_dict = {
    'level1': {
        'level2': {
            'level3': {
                'level4': 'value'
            }
        }
    }
}

for key0, value0 in nested_dict.items():
    for key1, value1 in value0.items():
        for key2, value2 in value1.items():
            for key3, value3 in value2.items():
                out = pd.concat(key3:{pd.DataFrame(key2:{pd.DataFrame(key1:{pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)

不幸的是,我遇到了这个错误 -

    out = pd.concat(key3:{pd.DataFrame(key2:{pd.DataFrame(key1:{pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)
                        ^
SyntaxError: invalid syntax

这是我想要的输出结果 -

  level1
  level2
  level3
  level4
0  value

编辑 -

我按照答案中的说明进行了操作 -


for key0, value0 in nested_dict.items():
    for key1, value1 in value0.items():
        for key2, value2 in value1.items():
            for key3, value3 in value2.items():
                out = pd.concat({key3:pd.DataFrame({key2:pd.DataFrame({key1:pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)

现在,我遇到了以下错误 -

Traceback (most recent call last):
  File "/home/thoma/.config/JetBrains/PyCharmCE2023.2/scratches/scratch_14.py", line 16, in <module>
    out = pd.concat({key3:pd.DataFrame({key2:pd.DataFrame({key1:pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/frame.py", line 663, in __init__
    mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 494, in dict_to_mgr
    return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 119, in arrays_to_mgr
    index = _extract_index(arrays)
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 657, in _extract_index
    raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index

编辑 2:我尝试了 out = pd.concat([{key3:pd.DataFrame([{key2:pd.DataFrame([{key1:pd.DataFrame([{key0: pd.DataFrame(value0)}])}])}])}], axis = 1)

但现在我又遇到了这个错误 -

Traceback (most recent call last):
  File "/home/thoma/.config/JetBrains/PyCharmCE2023.2/scratches/scratch_14.py", line 16, in <module>
    out = pd.concat([{key3:pd.DataFrame([{key2:pd.DataFrame([{key1:pd.DataFrame([{key0: pd.DataFrame(value0)}])}])}])}], axis = 1)
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/util/_decorators.py", line 317, in wrapper
    return func(*args, **kwargs)
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 369, in concat
    op = _Concatenator(
  File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 459, in __init__
    raise TypeError(msg)
TypeError: cannot concatenate object of type '<class 'dict'>'; only Series and DataFrame objs are valid

1 个回答

1

你的括号用得有点不对。

你应该用 {Key:Value} 而不是 Key:{Value}

所以在你的例子中,括号需要改成

pd.concat({key3: pd.DataFrame(...

不过对于所有的 key:value 组合,你都需要这样修改。

补充:

对于新的错误,你只需要把每个新的 pd.DataFrame 放到一个列表里就可以了。

pd.concat([{key3: pd.DataFrame([...])}])

注意现在 key3key2 的数据框都在一个列表里。其他的也要这样做。

撰写回答