分层列的多索引选择

2024-04-24 16:41:39 发布

您现在位置:Python中文网/ 问答频道 /正文

目标:转换通过Pandas DataReader从欧盟统计局获取的原始数据,并重新调整数据,使其以Pandas DateTime对象作为索引,以国家作为列。你知道吗

代码:

import pandas as pd
import pandas_datareader as web  
import datetime
start = datetime.datetime(1900,1,1)
end = datetime.date.today()
df2 = web.DataReader('tipsii20', 'eurostat', start = start,end = end)
df2.columns

查看这些列,我们可以看到我们正在使用一个多索引

MultiIndex(levels=[[u'Rest of the world'], [u'Net liabilities (liabilities minus assets)'], [u'Net external debt'], [u'Percentage of gross domestic product (GDP)'], [u'Unadjusted data (i.e. neither seasonally adjusted nor calendar adjusted data)'], [u'Austria', u'Belgium', u'Bulgaria', u'Croatia', u'Cyprus', u'Czech Republic', u'Denmark', u'Estonia', u'Finland', u'France', u'Germany (until 1990 former territory of the FRG)', u'Greece', u'Hungary', u'Ireland', u'Italy', u'Latvia', u'Lithuania', u'Luxembourg', u'Malta', u'Netherlands', u'Poland', u'Portugal', u'Romania', u'Slovakia', u'Slovenia', u'Spain', u'Sweden', u'United Kingdom'], [u'Annual']], labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 1, 2, 4, 5, 10, 6, 7, 11, 25, 8, 9, 3, 12, 13, 14, 16, 17, 15, 18, 19, 20, 21, 22, 26, 24, 23, 27], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]], names=[u'PARTNER', u'STK_FLOW', u'BOP_ITEM', u'UNIT', u'S_ADJ', u'GEO', u'FREQ'])

我想转换这个数据集,这样它就可以维护它的DateTime索引,但是使用names['GEO']作为列。这应该是df2.xs吗?你知道吗


Tags: ofthe数据importwebpandasdatetimenet
2条回答

您可以使用^{}

df2.columns = df2.columns.droplevel([0,1,2,3,4,6])

如果知道类似于Bharath shetty' solution的级别名称,另一种解决方案是:

df2.columns =  df2.columns.get_level_values('GEO')

pd.DataFrameget_level_values(5)一起使用,因为GEO在列的第五级,以防您希望保留数据帧以供将来参考,即

ndf = pd.DataFrame(df2.values,df2.index,df2.columns.get_level_values(5))

或者通过获取级别值来指定列,如

df2.columns =  df2.columns.get_level_values(5)

输出:

print(ndf.head().iloc[:,:4])

GEO          Austria  Belgium  Bulgaria  Cyprus
TIME_PERIOD                                    
2010-01-01      28.0   -121.2      37.1    70.9
2011-01-01      24.0   -118.8      29.6   127.1
2012-01-01      25.8   -102.7      25.4   137.2
2013-01-01      20.1    -88.4      21.6   140.0
2014-01-01      20.0    -71.1      18.3   136.1

相关问题 更多 >