试着从语法上理解这个问题。。。似乎是个难题。。。基本上,如果传感器项未在时间序列时间戳间隔源数据中捕获,则希望为每个丢失的传感器项添加一行,每个时间戳窗口都有一个空值
# list of sensor items [have 300 plus; only showing 4 as example]
list = ["temp", "pressure", "vacuum", "burner"]
# sample data
df = spark.createDataFrame([('2019-05-10 7:30:05', 'temp', '99'),\
('2019-05-10 7:30:05', 'burner', 'TRUE'),\
('2019-05-10 7:30:10', 'vacuum', '.15'),\
('2019-05-10 7:30:10', 'burner', 'FALSE'),\
('2019-05-10 7:30:10', 'temp', '75'),\
('2019-05-10 7:30:15', 'temp', '77'),\
('2019-05-10 7:30:20', 'pressure', '.22'),\
('2019-05-10 7:30:20', 'temp', '101'),], ["date", "item", "value"])
# current dilemma => all sensor items are not being captured / only updates to sensors are being captured in current back-end design streaming devices
+------------------+--------+-----+
| date| item|value|
+------------------+--------+-----+
|2019-05-10 7:30:05| temp| 99|
|2019-05-10 7:30:05| burner| TRUE|
|2019-05-10 7:30:10| vacuum| .15|
|2019-05-10 7:30:10| burner|FALSE|
|2019-05-10 7:30:10| temp| 75|
|2019-05-10 7:30:15| temp| 77|
|2019-05-10 7:30:20|pressure| .22|
|2019-05-10 7:30:20| temp| 101|
+------------------+--------+-----+
希望捕获每个时间戳的每个传感器项,以便在旋转数据帧之前执行正向填充插补[正向填充300+列会导致scala错误=>
Spark Caused by: java.lang.StackOverflowError Window Function?
# desired output
+------------------+--------+-----+
| date| item|value|
+------------------+--------+-----+
|2019-05-10 7:30:05| temp| 99|
|2019-05-10 7:30:05| burner| TRUE|
|2019-05-10 7:30:05| vacuum| NULL|
|2019-05-10 7:30:05|pressure| NULL|
|2019-05-10 7:30:10| vacuum| .15|
|2019-05-10 7:30:10| burner|FALSE|
|2019-05-10 7:30:10| temp| 75|
|2019-05-10 7:30:10|pressure| NULL|
|2019-05-10 7:30:15| temp| 77|
|2019-05-10 7:30:15|pressure| NULL|
|2019-05-10 7:30:15| burner| NULL|
|2019-05-10 7:30:15| vacuum| NULL|
|2019-05-10 7:30:20|pressure| .22|
|2019-05-10 7:30:20| temp| 101|
|2019-05-10 7:30:20| vacuum| NULL|
|2019-05-10 7:30:20| burner| NULL|
+------------------+--------+-----+
在my comment上展开:
您可以用不同日期的笛卡尔积和} 它。你知道吗
sensor_list
右键联接您的数据帧。因为sensor_list
很小,所以可以^{相关问题 更多 >
编程相关推荐