从不同结构的字典列表创建spark数据帧 - 问答 - Python中文网

从不同结构的字典列表创建spark数据帧

2024-04-16 11:48:05 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有一份字典的清单

说

list_ = [
 {u'column1': u'test1', u'column2': u'None'},
 {u'added_column1': u'test2', u'column2': u'None'}]

第一行有两列column1，column2

第二行有两列added_column1, column2

我想根据数据创建一个spark数据帧，并且应该随着列表的变化而改变

有什么长期的解决办法吗？在

目前

^{pr2}$

这很管用，但我得到了这个警告。在

UserWarning: inferring schema from dict is deprecated,please use pyspark.sql.Row instead warnings.warn("inferring schema from dict is deprecated,"

Tags：数据 from none added 字典 is schema dict

1条回答

网友

1楼 · 发布于 2024-04-16 11:48:05

您可以在RDD上使用toDF()函数，并指定转换为dataframe时用于推断模式的样本比率。在

list_ = [
 {u'column1': u'test1', u'column2': u'None'},
 {u'added_column1': u'test2', u'column2': u'None'}]

sc.parallelize(list_).toDF(sampleRatio=0.9).show()

使用行（从dict创建）创建数据帧要求所有行具有相同的列数

spark.createDataFrame(list(map(lambda x: Row(**x), list_))).show()

以上代码将给出错误： Input row doesn't have expected number of values required by the schema. 3 fields are required while 2 values are provided.

相关问题更多 >

编程相关推荐

热门问题

热门文章