apachebeam作业在大型数据上执行会话窗口时失败

elements = (p | 'IngestData' >> beam.io.Read(big_query_source)) elements | 'AddEventTimestamp' >> beam.ParDo(AddTimestampDoFn()) \ | 'SessionWindow' >> beam.WindowInto(window.Sessions(10 * 60)) \ | 'CreateTuple' >> beam.Map(lambda row: (row['id'], {'attribute1': row['attribute1'], 'date': row['date']})) \ | 'GroupById1' >> beam.GroupByKey() \ | 'AggregateSessions' >> beam.ParDo(AggregateTransactions()) \ | 'MergeWindows' >> beam.WindowInto(window.GlobalWindows()) \ | 'GroupById2' >> beam.GroupByKey() \ | 'MapSessionsToLists' >> beam.Map(lambda x: (x[0], [y for y in x[1]])) \ | 'BiggestSession' >> beam.ParDo(MaximumSession()) \ | "PrepForWrite" >> beam.Map(lambda x: x[1].update({"id": x[0]}) or x[1]) \ | 'WriteResult' >> WriteToText(known_args.output)

INFO:oauth2client.transport:Refreshing due to a 401 (attempt 1/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 2/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 1/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 1/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 2/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 2/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 1/2) INFO:oauth2client.transport:Refreshing due to a 401 (attempt 2/2)

1条回答

网友

1楼 · 发布于 2024-04-25 14:14:36

默认情况下，如果在批处理模式下出现任何错误，Dataflow将重试管道4次，而在流模式下运行时则不确定重试次数。在

请在堆栈驱动程序中为用于管道的计算引擎计算机创建仪表板，以分析正在发生的内存、CPU消耗和IO操作量。在对上述因素进行仔细分析后，应提高管道的配置。在

请确保根据您提供的数据，所有转换都能正常工作，并应用异常处理。在

相关问题更多 >

编程相关推荐

热门问题

热门文章