我试图读取一个csv文件,其中有一个示例:
datetime,check,lat,lon,co_alpha,atn,status,bc
2012-10-27 15:00:59,2,0,0,2.427,,,
2012-10-27 15:01:00,2,0,0,2.407,,,
2012-10-27 15:02:49,2,0,0,2.207,-17.358,0,-16162
2012-10-27 15:02:50,2,0,0,2.207,-17.354,0,8192
2012-10-27 15:02:51,1,0,0,2.207,-17.358,0,-8152
2012-10-27 15:02:52,1,0,0,2.207,-17.358,0,648
2012-10-27 15:06:03,0,51.195076,4.444407,2.349,-17.289,0,4909
2012-10-27 15:06:04,0,51.195182,4.44427,2.344,-17.289,0,587
2012-12-05 09:21:34,,,,,42.960,1,16430
2012-12-05 09:21:35,,,,,42.962,1,3597
我遇到的问题是,在只有int的列中,0被转换为NaN(例如列'check'和'status',这些列只包含int,但该列被读取为float,因为有实际的缺失值)。但我只希望将空值转换为NaN,而不是0。在
我得到的是:
^{pr2}$因此,在'check'和'status'列中,有许多NaN's。在'lat'和'lon'列中,0不被转换成NaN's
使用na_values=''
和{
我可以用dtype
关键字将特定列的数据类型指定为int。这将0保持为0,但问题是这些列也包含真正的NaN(空值)。所以,在这个例子中,这些值也被转换成0,就像在int列中不能有NaN一样。
编辑:升级到pandas 0.10.1后,即使不指定keep_default_na
和na_values
,它也能正常工作:
>>> pd.read_clipboard(sep=',', parse_dates=True, index_col=0)
check lat lon co_alpha atn status bc
datetime
2012-10-27 15:00:59 2 0.000000 0.000000 2.427 NaN NaN NaN
2012-10-27 15:01:00 2 0.000000 0.000000 2.407 NaN NaN NaN
2012-10-27 15:02:49 2 0.000000 0.000000 2.207 -17.358 0 -16162
2012-10-27 15:02:50 2 0.000000 0.000000 2.207 -17.354 0 8192
2012-10-27 15:02:51 1 0.000000 0.000000 2.207 -17.358 0 -8152
2012-10-27 15:02:52 1 0.000000 0.000000 2.207 -17.358 0 648
2012-10-27 15:06:03 0 51.195076 4.444407 2.349 -17.289 0 4909
2012-10-27 15:06:04 0 51.195182 4.444270 2.344 -17.289 0 587
2012-12-05 09:21:34 NaN NaN NaN NaN 42.960 1 16430
2012-12-05 09:21:35 NaN NaN NaN NaN 42.962 1 3597
您必须首先将
keep_default_na
设置为False
:从^{} 的文档字符串:
相关问题 更多 >
编程相关推荐