java如何在spark数据集中保存嵌套或JSON对象并转换为RDD?
我正在编写spark代码,其中我必须将多个列值保存为对象格式,并将结果保存到mongodb
给定数据集
|---|-----|------|----------|
|A |A_SRC|Past_A|Past_A_SRC|
|---|-----|------|----------|
|a1 | s1 | a2 | s2 |
我试过的
val ds1 = Seq(("1", "2", "3","4")).toDF("a", "src", "p_a","p_src")
val recordCol = functions.to_json(Seq($"a", $"src", $"p_a",$"p_src"),struct("a", "src", "p_a","p_src")) as "A"
ds1.select(recordCol).show(truncate = false)
给我的结果是
+-----------------------------------------+
|A |
+-----------------------------------------+
|{"a":"1","src":"2","p_a":"3","p_src":"4"}|
+-----------------------------------------+
我期待着像这样的事情
+-----------------------------------------+
|A |
+-----------------------------------------+
|{"source":"1","value":"2","p_source":"4","p_value":"3"}|
+-----------------------------------------+
如何更改对象类型中除列名以外的键。在java中使用地图
# 1 楼答案
您可以在
struct
列中传递as
,以便将其保存为您传递的名称