Pandas DataFrame输出为JSON
我有一个Pandas数据框,它的索引是日期时间格式,里面有一些按小时记录的数据列。我想把其中一列转换成一个JSON文件,这个文件应该是一个包含每天小时值数组的数组。
举个简单的例子:
如果我的数据框是这样的:
In [106]:
rng = pd.date_range('1/1/2011 01:00:00', periods=12, freq='H')
df = pd.DataFrame(randn(12, 1), index=rng, columns=['A'])
In [107]:
df
Out[107]:
A
2011-01-01 01:00:00 -0.067214
2011-01-01 02:00:00 0.820595
2011-01-01 03:00:00 0.442557
2011-01-01 04:00:00 -1.000434
2011-01-01 05:00:00 -0.760783
2011-01-01 06:00:00 -0.106619
2011-01-01 07:00:00 0.786618
2011-01-01 08:00:00 0.144663
2011-01-01 09:00:00 -1.455017
2011-01-01 10:00:00 0.865593
2011-01-01 11:00:00 1.289754
2011-01-01 12:00:00 0.601067
我希望得到这样的JSON文件:
[
[-0.0672138259,0.8205950583,0.4425568167,-1.0004337373,-0.7607833867,-0.1066187698,0.7866183048,0.1446634381,-1.4550165851,0.8655931982,1.2897541164,0.6010672247]
]
我实际的数据框有很多天,所以大概会是这样的:
[
[value@hour1day1, value@hour2day1.....value@hour24day1],
[value@hour1day2, value@hour2day2.....value@hour24day2],
[value@hour1day3, value@hour2day3.....value@hour24day3],
....
[value@hour1LastDay, value@hour2LastDay.....value@hour24LastDay]
]
1 个回答
8
import json
import pandas as pd
import numpy as np
rng = pd.date_range('1/1/2011 01:00:00', periods=12, freq='H')
df = pd.DataFrame(np.random.randn(12, 1), index=rng, columns=['A'])
print json.dumps(df.T.as_matrix().tolist(),indent=4)
输出:
[
[
-0.6916923670267555,
0.23075256008033393,
1.2390943452146521,
-0.9421708175530891,
-1.4622768586461448,
-0.3973987276444045,
-0.04983495806442656,
-1.9139530636627042,
1.9562147260518052,
-0.8296105620697014,
0.2888681009437529,
-2.3943000262784424
]
]
或者作为一个完整的例子,展示多个天数,使用 groupby
功能:
rng = pd.date_range('1/1/2011 01:00:00', periods=48, freq='H')
df = pd.DataFrame(np.random.randn(48, 1), index=rng, columns=['A'])
grouped = df.groupby(lambda x: x.day)
data = [group['A'].values.tolist() for day, group in grouped]
print json.dumps(data, indent=4)
输出:
[
[
-0.8939584996681688,
...
-1.1332895023662326
],
[
-0.1514553673781838,
...
-1.8380494963443343
],
[
-1.8342085568898159
]
]