Pyspark：需要连接多个数据框，即需要将第1个语句的输出与第3个数据框等依次连接。

if i >= length and i > 2: j=j+1 print i print j #line.show() second="newdf{}" .format(j +1) first="newdf{}" .format(j) third="newdf{}" .format(j +2) print first print second print third # newdf1.show() print "one" #if (line == "/MTDSumOfCustomerInitiatedTrxn"): #first="enhanced_df{}" .format(i -1) #print first #first.show() #final=enhanced_df{}.join(enhanced_df{},'ENT_CUST_ID','outer') .format(i,i -1) #stat="{},{}" .format(first,second) #print stat b="prevDf=GenericFunctions.enhanced_customer({},{},'ENT_CUST_ID')" .format(second,first) print b exec(b) prevDf.show(i) c= "Finaldf=GenericFunctions.enhanced_customer(prevDf,{},'ENT_CUST_ID')" .format(third)

1条回答

网友

1楼 · 发布于 2024-05-16 09:28:47

我试着用自定义项。只需妥善处理列名。你知道吗

>>> l = [(1,2,3,4),(3,4,5,6)]
>>> df = spark.createDataFrame(l,['col1','col2','col3','col4'])
>>> df.show()
+  +  +  +  +
|col1|col2|col3|col4|
+  +  +  +  +
|   1|   2|   3|   4|
|   3|   4|   5|   6|
+  +  +  +  +

>>> l = [(1,7,8,9),(3,9,5,7)]
>>> df1 = spark.createDataFrame(l,['col1','col2','col3','col4'])
>>> df1.show()
+  +  +  +  +
|col1|col2|col3|col4|
+  +  +  +  +
|   1|   7|   8|   9|
|   3|   9|   5|   7|
+  +  +  +  +

>>> l = [(1,89,45,67),(3,23,34,56)]
>>> df2 = spark.createDataFrame(l,['col1','col2','col3','col4'])
>>> df2.show()
+  +  +  +  +
|col1|col2|col3|col4|
+  +  +  +  +
|   1|  89|  45|  67|
|   3|  23|  34|  56|
+  +  +  +  +

>>> l = [(3,65,21,32),(1,87,64,35)]
>>> df3 = spark.createDataFrame(l,['col1','col2','col3','col4'])
>>> df3.show()
+  +  +  +  +
|col1|col2|col3|col4|
+  +  +  +  +
|   3|  65|  21|  32|
|   1|  87|  64|  35|
+  +  +  +  +

>>> l = [(1,99,101,345),(3,67,53,21)]
>>> df4 = spark.createDataFrame(l,['col1','col2','col3','col4'])
>>> df4.show()
+  +  +  +  +
|col1|col2|col3|col4|
+  +  +  +  +
|   1|  99| 101| 345|
|   3|  67|  53|  21|
+  +  +  +  +

>>> def join_udf(df0,*df):
...    for id,d in enumerate(df):
...        if id == 0:
...           prevdf = df0.join(d,'col1')
...        else:
...           prevdf = prevdf.join(d,'col1')
...    return prevdf
...
>>> jdf = join_udf(df,df1,df2,df3,df4)
>>> jdf.show()
+  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
|col1|col2|col3|col4|col2|col3|col4|col2|col3|col4|col2|col3|col4|col2|col3|col4|
+  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +
|   1|   2|   3|   4|   7|   8|   9|  89|  45|  67|  87|  64|  35|  99| 101| 345|
|   3|   4|   5|   6|   9|   5|   7|  23|  34|  56|  65|  21|  32|  67|  53|  21|
+  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +

相关问题更多 >

编程相关推荐

热门问题

热门文章