Pandas - 将数据框多重索引转换为日期时间对象

6 投票

2 回答

4262 浏览

提问于 2025-04-18 08:55

考虑一个输入文件，b.dat：

string,date,number
a string,2/5/11 9:16am,1.0
a string,3/5/11 10:44pm,2.0
a string,4/22/11 12:07pm,3.0
a string,4/22/11 12:10pm,4.0
a string,4/29/11 11:59am,1.0
a string,5/2/11 1:41pm,2.0
a string,5/2/11 2:02pm,3.0
a string,5/2/11 2:56pm,4.0
a string,5/2/11 3:00pm,5.0
a string,5/2/14 3:02pm,6.0
a string,5/2/14 3:18pm,7.0

我可以这样来汇总每个月的总数：

b=pd.read_csv('b.dat')
b['date']=pd.to_datetime(b['date'],format='%m/%d/%y %I:%M%p')
b.index=b['date']
bg=pd.groupby(b,by=[b.index.year,b.index.month])
bgs=bg.sum()

汇总后的索引看起来是这样的：

bgs

            number
2011 2       1
     3       2
     4       8
     5      14
2014 5      13

bgs.index

MultiIndex(levels=[[2011, 2014], [2, 3, 4, 5]],
       labels=[[0, 0, 0, 0, 1], [0, 1, 2, 3, 3]])

我想把这个索引重新格式化成日期时间格式（可以把日期设为每个月的第一天）。

我试过以下方法：

bgs.index = pd.to_datetime(bgs.index)

还有

bgs.index = pd.DatetimeIndex(bgs.index)

这两种方法都不行。有没有人知道我该怎么做？

pandas 数据框多重索引数据汇总日期时间格式

2 个回答

你可以通过你想要的日期计算，从索引中创建一列，然后把它设置为索引：

bgs['expanded_date'] = bgs.index.map(lambda x: datetime.date(x.year, x.month, 1))
bgs.set_index('expanded_date')

回答于 2025-04-18 由 Python大师

分享举报

考虑使用 'M' 来重新采样，而不是通过 DatetimeIndex 的属性进行分组：

In [11]: b.resample('M', how='sum').dropna()
Out[11]:
            number
date
2011-02-28       1
2011-03-31       2
2011-04-30       8
2011-05-31      14
2014-05-31      13

注意：如果你不想要中间的空值（NaN），你需要把它们去掉。

回答于 2025-04-18 由 Python大师

分享举报

Pandas - 将数据框多重索引转换为日期时间对象

2 个回答

撰写回答