from pyspark.sql.functions import mean, desc
df.filter(df["country"] == "france") \ # only french stations
.groupBy("station_id") \ # by station
.agg(mean("temperature").alias("average_temp")) \ # calculate average
.orderBy(desc("average_temp")) \ # order by average
.take(100) # return first 100 rows
我们按以下方式将您的查询转换为
Spark SQL
:使用
RDD
API和匿名函数:相关问题 更多 >
编程相关推荐