有一个pyspark.sql.dataframe。数据帧具有以下结构,并在以下所有国家/地区持续所有月份:
+----------+-------+------------------+
|DATE |COUNTRY|AVG_TEMPS |
+----------+-------+------------------+
|2007-01-01|Åland |0.5939999999999999|
|2007-02-01|Åland |-4.042 |
|2007-03-01|Åland |2.443 |
|2007-04-01|Åland |4.621 |
|2007-05-01|Åland |8.411 |
|2007-06-01|Åland |13.722999999999999|
|2007-07-01|Åland |15.749 |
+----------+-------+------------------+
预期的输出是类似下面给定链接的python字典:
pyspark - create DataFrame Grouping columns in map type structure
-----------------------------------------
| DATE | COUNTRY_TEMP |
-----------------------------------------
|2007-01-01|{Åland: 0.593, Alfredo:2.44}|
|2007-01-02| {Åland: 0.57, Alfredo:2.14}|
-----------------------------------------
当我试着去理解的时候,我得到了一些错误
df_converted = newres.groupBy('DATE').\
agg(collect_list(create_map(col("COUNTRY"))))
错误:
AnalysisException: u"cannot resolve 'map(`COUNTRY`)' due to data type mismatch: map expects a positive even number of arguments.
;;\n'Aggregate [DATE#179], [DATE#179, collect_list(map(COUNTRY#180), 0, 0) AS collect_list(map(COUNTRY))#189]\n+- Project [DATE#146 AS DATE#179,
COUNTRY#85 AS COUNTRY#180, AVG_TEMPS#147 AS AVG_TEMPS#181]\n +- Project [dt#82 AS DATE#146, COUNTRY#85, AverageTemperature#83 AS AVG_TEMPS#147]
\n +- SubqueryAlias global_temps_by_cntry\n +- Relation[dt#82,AverageTemperature#83,AverageTemperatureUncertainty#84,Country#85] csv\n"
有人能帮忙吗???你知道吗
如@user3689574所述,尝试添加值以创建\u映射:
相关问题 更多 >
编程相关推荐