使用python将两个spark数据帧合并到一个模式中

schema1: (id, type, count) -- type has the values type1, type2, type3 schema2: (id, timestamp, test1, test2, test3) finalschema: (id, timestamp, test1, test2, test3, type1count, type2count, type3count)

2条回答

网友

1楼 · 编辑于 2024-06-07 06:06:38

您可以在id列上连接上面两个dataframe，下面是相同的示例代码段

df1 schema is (id, type, count).
df2 schema is (id, timestamp, test1, test2, test3, type1count, type2count, type3count)

merged_df = df1.join(df2, on=['id'], how='left_outer')

希望这会有所帮助

网友

2楼 · 编辑于 2024-06-07 06:06:38

在将第一个数据帧与第二个数据帧联接之前，可以使用Pyspark pivot函数来透视第一个数据帧

工作示例：

import pyspark.sql.functions as F
import pyspark.sql.functions as F
df = spark.createDataFrame([[1,'type1',10],
                            [1,'type2',10],
                            [1,'type3',10]],
                           schema=['id','type','quantity'])

df = df.groupBy('id').pivot('type').sum('quantity')
display(df)

您可以随意更改聚合

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python将两个spark数据帧合并到一个模式中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >