如何从DataFrame中检索两个数字?

2024-04-18 12:41:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我在PySpark中有以下功能

import pyspark.sql.functions as func

def get_num(self, spark, id):

    df = spark \
        .read \
        .format("org.elasticsearch.spark.sql") \
        .load("myindex") \
        .filter(func.col("id") == id) \
        .groupBy("id") \
        .agg(
                func.count(func.lit(1)).alias("number_occurrences_today"),
                func.countDistinct("host_id").alias("number_hosts")
            )

如果df是None,函数应该返回0,0。否则,它应该为id返回number_occurrences_todaynumber_hosts的值

我怎么做

这就是我迄今为止所尝试的:

    if (df is None):
        return 0, 0
    else:
        return df["number_occurrences_today"], df["number_hosts"]

Tags: import功能noneidnumberdfsqltoday