基于时间戳的JSON连接域展平

2024-05-23 23:17:01 发布

您现在位置:Python中文网/ 问答频道 /正文

用Join-on-TimeStamp压平JSON,类似于问题Flattening a nested JSON to multiple rows,但需要Java、Scala、Spark和PySpark中的解决方案

输入JSON

{ "Sensor": "seda_01", "Location": { "City": "Los Angeles", "State": "CA" }, "rain_value": [ [ "1564073521", "0.02" ], [ "1564073522", "0.01" ], [ "1564073523", "0.03" ] ], "sun_value": [ [ "1564073521", "0.11" ], [ "1564073522", "0.10" ], [ "1564073523", "0.13" ] ], "wind_value": [ [ "1564073521", "0.21" ], [ "1564073522", "0.21" ], [ "1564073523", "0.23" ] ] }


{ "Sensor": "seda_01", "Location": { "City": "Los Angeles", "State": "CA" }, "rain_value": [ [ "1564073521", "0.02" ], [ "1564073522", "0.01" ], [ "1564073523", "0.03" ] ], "sun_value": [ [ "1564073521", "0.11" ], [ "1564073522", "0.10" ], [ "1564073523", "0.13" ] ], "wind_value": [ [ "1564073521", "0.21" ], [ "1564073522", "0.21" ], [ "1564073523", "0.23" ] ] }

输出dataframe

| Sensor| Location_City | Location_Sate| Rain_value_TS | Rain_value | Sun_value_TS | Sun_value |
------------------------------------------------------- ----------------
|  seda_01 | Los Angeles | CA | 1564073521 | 0.02 | 1564073521 | 0.11 |
|  seda_01 | Los Angeles | CA | 1564073522 | 0.01 | 1564073522 | 0.10 |

请注意:Rain_value_TS = Sun_value_TS。我们可以用其中一个作为时间戳, 对于给定的时间戳,如果我们只有Rain\u值,Rain\u值,我们可以为Sun\u值输入NULL。你知道吗


Tags: jsoncityvaluelocationsensorcasunstate