如何`pd.concat`嵌套字典?
我正在尝试用 pd.concat
来合并多个数据框(DataFrame)。基本上,我是想按照这篇帖子的说明,处理一个四层嵌套字典。
这是我用一个简单例子尝试的结果 -
import pandas as pd
nested_dict = {
'level1': {
'level2': {
'level3': {
'level4': 'value'
}
}
}
}
for key0, value0 in nested_dict.items():
for key1, value1 in value0.items():
for key2, value2 in value1.items():
for key3, value3 in value2.items():
out = pd.concat(key3:{pd.DataFrame(key2:{pd.DataFrame(key1:{pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)
不幸的是,我遇到了这个错误 -
out = pd.concat(key3:{pd.DataFrame(key2:{pd.DataFrame(key1:{pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)
^
SyntaxError: invalid syntax
这是我想要的输出结果 -
level1
level2
level3
level4
0 value
编辑 -
我按照答案中的说明进行了操作 -
for key0, value0 in nested_dict.items():
for key1, value1 in value0.items():
for key2, value2 in value1.items():
for key3, value3 in value2.items():
out = pd.concat({key3:pd.DataFrame({key2:pd.DataFrame({key1:pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)
现在,我遇到了以下错误 -
Traceback (most recent call last):
File "/home/thoma/.config/JetBrains/PyCharmCE2023.2/scratches/scratch_14.py", line 16, in <module>
out = pd.concat({key3:pd.DataFrame({key2:pd.DataFrame({key1:pd.DataFrame({key0: pd.DataFrame(value0)})})})}, axis = 1)
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/frame.py", line 663, in __init__
mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 494, in dict_to_mgr
return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 119, in arrays_to_mgr
index = _extract_index(arrays)
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/internals/construction.py", line 657, in _extract_index
raise ValueError("If using all scalar values, you must pass an index")
ValueError: If using all scalar values, you must pass an index
编辑 2:我尝试了 out = pd.concat([{key3:pd.DataFrame([{key2:pd.DataFrame([{key1:pd.DataFrame([{key0: pd.DataFrame(value0)}])}])}])}], axis = 1)
但现在我又遇到了这个错误 -
Traceback (most recent call last):
File "/home/thoma/.config/JetBrains/PyCharmCE2023.2/scratches/scratch_14.py", line 16, in <module>
out = pd.concat([{key3:pd.DataFrame([{key2:pd.DataFrame([{key1:pd.DataFrame([{key0: pd.DataFrame(value0)}])}])}])}], axis = 1)
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/util/_decorators.py", line 317, in wrapper
return func(*args, **kwargs)
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 369, in concat
op = _Concatenator(
File "/home/thoma/anaconda3/envs/benchmark/lib/python3.8/site-packages/pandas/core/reshape/concat.py", line 459, in __init__
raise TypeError(msg)
TypeError: cannot concatenate object of type '<class 'dict'>'; only Series and DataFrame objs are valid
1 个回答
1
你的括号用得有点不对。
你应该用 {Key:Value}
而不是 Key:{Value}
。
所以在你的例子中,括号需要改成
pd.concat({key3: pd.DataFrame(...
不过对于所有的 key:value
组合,你都需要这样修改。
补充:
对于新的错误,你只需要把每个新的 pd.DataFrame
放到一个列表里就可以了。
pd.concat([{key3: pd.DataFrame([...])}])
注意现在 key3
和 key2
的数据框都在一个列表里。其他的也要这样做。