Pandas dataframe.value_counts()返回浮点数索引的键
我正在尝试使用pandas来获取值的计数。当我输入以下命令时:
my_variable.value_counts().keys()
我得到了如下输出:
Index([1.0, 0.0, 1.00999999046, 2.0, 2.00999999046, 3.0, 1.01000022888, 3.00999999046, 4.00999999046, 4.0, 6.00999999046, 5.00999999046, 8.00999999046, 2.01000022888, 5.0, 0.990000009537, 9.00999999046, 6.0, 7.0, 12.0099999905, 7.00999999046, 10.0099999905, 3.01000022888, 19.0199999809, 11.0099999905, 20.0199999809, 8.0, 14.0199999809, 4.01000022888, 5.01000022888, 38.0399999619, 46.0499999523, 40.0399999619, 20.0299999714, 16.0199999809, 18.0299999714, 9.01999998093, 11.0199999809, 21.0199999809, -10651.4099998, -4643.13999987, -6388.92000008, -5779.98000002], dtype=object)
我的问题是,如何访问由浮点数值组成的键,比如键1.00999999046?
我可以通过以下方式访问索引1.0:
my_variable.value_counts()[1]
但是,如果我尝试:
my_variable.value_counts()[1.00999999046]
那么我会收到一个错误提示:
KeyError: 1.00999999046
我觉得这可能和索引的类型是object有关,但我不知道该怎么处理。任何建议都将不胜感激。
2 个回答
你可以用 value_counts().index[i]
来代替。这里的 i
是你想要调用的索引的编号。
在版本0.13及以上,这个功能运行得很好。在0.13之前,浮点数索引并没有什么特别之处。现在它们有了新的逻辑,可以避免将索引值四舍五入或截断成整数。换句话说,浮点数的值会直接被查找,而不会被强制转换(对于Float64Index来说)。实际上,这种索引类型的目的就是为了创建一个统一的索引模型,让使用[],ix,loc
时返回的结果完全一致。
可以查看文档了解更多信息。
In [8]: i = Index([1.0, 0.0, 1.00999999046, 2.0, 2.00999999046, 3.0, 1.01000022888, 3.00999999046, 4.00999999046, 4.0, 6.00999999046, 5.00999999046, 8.00999999046, 2.01000022888, 5.0, 0.990000009537, 9.00999999046, 6.0, 7.0, 12.0099999905, 7.00999999046, 10.0099999905, 3.01000022888, 19.0199999809, 11.0099999905, 20.0199999809, 8.0, 14.0199999809, 4.01000022888, 5.01000022888, 38.0399999619, 46.0499999523, 40.0399999619, 20.0299999714, 16.0199999809, 18.0299999714, 9.01999998093, 11.0199999809, 21.0199999809, -10651.4099998, -4643.13999987, -6388.92000008, -5779.98000002])
In [9]: i
Out[9]: Float64Index([1.0, 0.0, 1.00999999046, 2.0, 2.00999999046, 3.0, 1.01000022888, 3.00999999046, 4.00999999046, 4.0, 6.00999999046, 5.00999999046, 8.00999999046, 2.01000022888, 5.0, 0.990000009537, 9.00999999046, 6.0, 7.0, 12.0099999905, 7.00999999046, 10.0099999905, 3.01000022888, 19.0199999809, 11.0099999905, 20.0199999809, 8.0, 14.0199999809, 4.01000022888, 5.01000022888, 38.0399999619, 46.0499999523, 40.0399999619, 20.0299999714, 16.0199999809, 18.0299999714, 9.01999998093, 11.0199999809, 21.0199999809, -10651.4099998, -4643.13999987, -6388.92000008, -5779.98000002], dtype='object')
In [10]: s = Series(i.tolist() * 3)
In [13]: s.value_counts()[1.00999999046]
Out[13]: 3
请注意,索引的显示是截断后的值(它们实际上是完整存在的,只是在这里不显示超过两位的小数)。
In [14]: s.value_counts().sort_index()
Out[14]:
-10651.41 3
-6388.92 3
-5779.98 3
-4643.14 3
0.00 3
0.99 3
1.00 3
1.01 3
1.01 3
2.00 3
2.01 3
2.01 3
3.00 3
3.01 3
3.01 3
4.00 3
4.01 3
4.01 3
5.00 3
5.01 3
5.01 3
6.00 3
6.01 3
7.00 3
7.01 3
8.00 3
8.01 3
9.01 3
9.02 3
10.01 3
11.01 3
11.02 3
12.01 3
14.02 3
16.02 3
18.03 3
19.02 3
20.02 3
20.03 3
21.02 3
38.04 3
40.04 3
46.05 3
dtype: int64
In [15]: s.value_counts()[1.00999999046]
Out[15]: 3
In [16]: s.value_counts().keys()
Out[16]: Float64Index([3.00999999046, 14.0199999809, 2.00999999046, -10651.4099998, 2.01000022888, 18.0299999714, 20.0299999714, 16.0199999809, 6.00999999046, 3.01000022888, 8.0, 11.0199999809, 19.0199999809, 7.0, 1.01000022888, 0.990000009537, 4.0, 3.0, 2.0, 1.0, 46.0499999523, 11.0099999905, 12.0099999905, 4.00999999046, 40.0399999619, 7.00999999046, 9.01999998093, 6.0, -6388.92000008, 21.0199999809, 38.0399999619, 5.0, 20.0199999809, 4.01000022888, -5779.98000002, 1.00999999046, 9.00999999046, -4643.13999987, 5.01000022888, 10.0099999905, 8.00999999046, 5.00999999046, 0.0], dtype='object')