当我打印累加器时，它的值为零

2024-06-08 22:08:34 发布

男 | 程序猿一只，喜欢编程写python代码。

下面是一个示例pyspark代码片段，我试图在其中检查（作为健全性检查）在“filter”转换之后处理了多少个订单。因此，我试图定义一个累加器，并将其用作计数器，以获取“处理的订单数”

    orders=inputpath + "/orders" # Accepting and creating  the "full input path" for input file
    counter=sc.accumulator(0) #defining accumulator

    def OrderTuples(order): #defining a function to incorporate "counter increment" for every records filtered out from filter transformation to the map transformation
        counter.add(1) 
        return (int(order.split(",")[0]),1)

     ordersFiltered = sc.textFile(orders). \
        filter(lambda order : month in order.split(",")[1]). \
        map(lambda order : OrderTuples(order)) # Calling the  function here
    print(f"NO OF ORDERS PROCESSED:{counter}") # printing the accumulator final value here

但作为最终输出，我仍然得到零值。我错在哪里。我是第一次用蓄能器。默认情况下sc.textFile（orders）有2个分区，我使用--num executors 2（13节点集群）感谢您的帮助：）

Tags： the to map for input counter order function

1条回答

网友

1楼 · 发布于 2024-06-08 22:08:34

ordersFiltered需要在实际计算筛选器lambda之前执行一个操作（如collect）

当我打印累加器时，它的值为零

相关问题更多 >

编程相关推荐

热门问题

热门文章

当我打印累加器时，它的值为零

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >