具有分层索引的DataFrame中的列中的断言错误

2024-04-26 23:50:48 发布

您现在位置:Python中文网/ 问答频道 /正文

另一个熊猫问题:

我有一个分层索引表:

In [51]:
from pandas import DataFrame
f = DataFrame({'a': ['1','2','3'], 'b': ['2','3','4']})
f.columns = [['level1 item1', 'level1 item2'],['', 'level2 item2'], ['level3 item1', 'level3 item2']]
f
Out[51]:
    level1 item1    level1 item2
                    level2 item2
    level3 item1    level3 item2
0         1              2
1         2              3
2         3              4

选择level1 item1会产生以下错误:

^{pr2}$

不过,这似乎与级别的数量有一定关系。当我将级别的数量减少到只有两个时,就不会出现这样的错误:

from pandas import DataFrame
f = DataFrame({'a': ['1','2','3'], 'b': ['2','3','4']})
f.columns = [['level1 item1', 'level1 item2'],['', 'level1 item2']]
f
Out[59]:
     level1 item1   level1 item2
                    level1 item2
0          1              2
1          2              3
2          3              4

相反,前面的数据帧给出了预期的序列:

In [63]:
f['level1 item1']
Out[63]:
0    1
1    2
2    3
Name: level1 item1

用虚拟字符填充level1 item1下面的空白可以“修复”这个问题,但这不是一个好的解决方案。在

如何在不使用虚拟名称填充这些列的情况下解决此问题?在

非常感谢!在


原始示例:

enter image description here

此表使用以下索引生成:

index = [np.array(['Size and accumulated size of adjusted gross income', 'All returns', 'All returns', 'All returns', 'All returns', 'All returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns']),
np.array(['', 'Number of returns', 'Percent of total', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit', 'Number of returns', 'Percent of total', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit', 'Taxable income', 'Taxable income', 'Taxable income', 'Income tax after credits', 'Income tax after credits', 'Income tax after credits', 'Total income tax', 'Total income tax', 'Total income tax', 'Total income tax', 'Total income tax']),
np.array(['', '', '', '', '', '', '', '','', '', 'Number of returns', 'Amount', 'Percent of total', 'Number of returns', 'Amount', 'Percent of total', 'Amount', 'Percent of', 'Percent of', 'Percent of', 'Average total income tax (dollars)']),
np.array(['', '', '', 'Amount', 'Percent of total', 'Average (dollars)', 'Average (dollars)', 'Average (dollars)', 'Amount', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Total', 'Taxable income', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit'])]

df.columns = index

这几乎是CSV文件中一些数据的完美副本,但你可以看到,在“回报数”、“总额百分比”和“调整后总收入减去赤字”下面有一个缺口。当我试图选择返回数量时,这个差距会产生这个错误:

In [68]: df['Taxable returns']['Number of returns']
AssertionError: Index length did not match values

我不明白这个错误。所以如果你能给我一个很好的解释,我将不胜感激。在任何情况下,当我使用这个索引填补这个空白时(注意第三个numpy数组中的第一个元素):

index = [np.array(['Size and accumulated size of adjusted gross income', 'All returns', 'All returns', 'All returns', 'All returns', 'All returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns', 'Taxable returns']),
np.array(['', 'Number of returns', 'Percent of total', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit', 'Number of returns', 'Percent of total', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit', 'Taxable income', 'Taxable income', 'Taxable income', 'Income tax after credits', 'Income tax after credits', 'Income tax after credits', 'Total income tax', 'Total income tax', 'Total income tax', 'Total income tax', 'Total income tax']),
np.array(['1', '2', '3', '4', '5', '6', '7', '8','9', '10', 'Number of returns', 'Amount', 'Percent of total', 'Number of returns', 'Amount', 'Percent of total', 'Amount', 'Percent of', 'Percent of', 'Percent of', 'Average total income tax (dollars)']),
np.array(['', '', '', 'Amount', 'Percent of total', 'Average (dollars)', 'Average (dollars)', 'Average (dollars)', 'Amount', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Percent of total', 'Total', 'Taxable income', 'Adjusted gross income less deficit', 'Adjusted gross income less deficit'])]

df.columns = index

我得到了正确的结果:

In [71]: df['Taxable returns']['Number of returns']
Out[71]:
7
Average (dollars)
0    90,660,104
1    3,495
...

Tags: ofreturnstotallesstaxpercentitem1income
1条回答
网友
1楼 · 发布于 2024-04-26 23:50:48

我昨天就想办法解决这个问题。以下是github master上的新行为:

In [1]: paste
from pandas import DataFrame
f = DataFrame({'a': ['1','2','3'], 'b': ['2','3','4']})
f.columns = [['level1 item1', 'level1 item2'],['', 'level2 item2'], ['level3 item1', 'level3 item2']]
f

##   End pasted text  
Out[1]: 
  level1 item1 level1 item2
               level2 item2
  level3 item1 level3 item2
0            1            2
1            2            3
2            3            4

In [2]: f['level1 item1']
Out[2]: 
  level3 item1
0            1
1            2
2            3

相关问题 更多 >