There is specially handling for not-a-number (NaN) when dealing with float or double types that does not exactly match standard floating point semantics. Specifically:
NaN = NaN returns true.
In aggregations, all NaN values are grouped together.
NaN is treated as a normal value in join keys.
NaN values go last when in ascending order, larger than any other numeric value.
你只看到Python命令行为与AdAs行为的区别。特别是Spark认为NaN是平等的:
spark.sql("""
WITH table AS (SELECT CAST('NaN' AS float) AS x, cast('NaN' AS float) AS y)
SELECT x = y, x != y FROM table
""").show()
这既是预期的行为,也是记录在案的行为。引用官方的Spark SQL Guide(重点是我的)的NaN Semantics部分:
你只看到Python命令行为与AdAs行为的区别。特别是Spark认为NaN是平等的:
^{pr2}$而普通Python
和NumPy
别这样
您可以查看^{} docstring 以获取其他示例。在
所以为了得到期望的结果,你必须显式地检查NaN
相关问题 更多 >
编程相关推荐