Spark Pyi怎么做的？

2条回答

网友

1楼 · 编辑于 2024-04-25 19:30:05

下面是上面的一个示例数据帧。我用这个solution来解决你的问题。在

df = spark.createDataFrame(
     [[1, ['Date_Min', 'Date_Max', 'Device'], ['148590', '148590', 'iphone']], 
      [2, ['Date_Min', 'Date_Max', 'Review'], ['148590', '148590', 'Good']],     
      [3, ['Date_Min', 'Date_Max', 'Review', 'Device'], ['148590', '148590', 'Bad', 'samsung']]], 
     schema=['id', 'l1', 'l2'])

在这里，您可以定义udf来压缩每行的两个列表。在

^{pr2}$

最后，可以将两个列压缩在一起，然后分解该列。在

df_out = df.withColumn("tmp", zip_list('l1', 'l2')).\
    withColumn("tmp", explode("tmp")).\
    select('id', col('tmp.first').alias('Operation'), col('tmp.second').alias('Value'))
df_out.show()

输出

+ -+    -+   -+
| id|Operation|  Value|
+ -+    -+   -+
|  1| Date_Min| 148590|
|  1| Date_Max| 148590|
|  1|   Device| iphone|
|  2| Date_Min| 148590|
|  2| Date_Max| 148590|
|  2|   Review|   Good|
|  3| Date_Min| 148590|
|  3| Date_Max| 148590|
|  3|   Review|    Bad|
|  3|   Device|samsung|
+ -+    -+   -+

网友

2楼 · 编辑于 2024-04-25 19:30:05

如果使用DataFrame，请尝试这：在

import pyspark.sql.functions as F

your_df.select("id", F.explode("Operation"), F.explode("Value")).show()

相关问题更多 >

编程相关推荐

热门问题

热门文章

Spark Pyi怎么做的？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >