“float”与的比较`np.nan公司`在Spark Datafram中

1条回答

网友

1楼 · 发布于 2024-04-23 05:30:28

这既是预期的行为，也是记录在案的行为。引用官方的Spark SQL Guide（重点是我的）的NaN Semantics部分：

There is specially handling for not-a-number (NaN) when dealing with float or double types that does not exactly match standard floating point semantics. Specifically:
NaN = NaN returns true.
In aggregations, all NaN values are grouped together.
NaN is treated as a normal value in join keys.
NaN values go last when in ascending order, larger than any other numeric value.

你只看到Python命令行为与AdAs行为的区别。特别是Spark认为NaN是平等的：

spark.sql("""
    WITH table AS (SELECT CAST('NaN' AS float) AS x, cast('NaN' AS float) AS y) 
    SELECT x = y, x != y FROM table
""").show()

^{pr2}$

而普通Python

float("NaN") == float("NaN"), float("NaN") != float("NaN")

(False, True)

和NumPy

np.nan == np.nan, np.nan != np.nan

(False, True)

别这样

您可以查看^{} docstring以获取其他示例。在

所以为了得到期望的结果，你必须显式地检查NaN

from pyspark.sql.functions import col, isnan, when

when(isnan("_1") | isnan("_2"), False).otherwise(col("_1") > col("_2"))

相关问题更多 >

编程相关推荐

热门问题

热门文章

“float”与的比较`np.nan公司`在Spark Datafram中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >