Pyspark将标准列表转换为数据fram

from pyspark.sql.types import StructType from pyspark.sql.types import StructField from pyspark.sql.types import StringType, IntegerType schema = StructType([StructField("value", IntegerType(), True)]) my_list = [1, 2, 3, 4] rdd = sc.parallelize(my_list) df = sqlContext.createDataFrame(rdd, schema) df.show()

2条回答

网友

1楼 · 编辑于 2024-04-25 20:31:55

请参见以下代码：

    from pyspark.sql import Row
    li=[1,2,3,4]
    rdd1 = sc.parallelize(li)
    row_rdd = rdd1.map(lambda x: Row(x))
    df=sqlContext.createDataFrame(row_rdd,['numbers']).show()

测向

+-------+
|numbers|
+-------+
|      1|
|      2|
|      3|
|      4|
+-------+

网友

2楼 · 编辑于 2024-04-25 20:31:55

此解决方案也是一种使用较少代码、避免序列化到RDD的方法，而且可能更容易理解：

from pyspark.sql.types import IntegerType

# notice the variable name (more below)
mylist = [1, 2, 3, 4]

# notice the parens after the type name
spark.createDataFrame(mylist, IntegerType()).show()

注意：关于命名变量list：术语list是一个Python内置函数，因此，强烈建议我们避免使用内置名称作为变量的名称/标签，因为我们最终会覆盖list()函数等内容。当快速和肮脏的原型设计时，许多人使用类似的东西：mylist。

相关问题更多 >

编程相关推荐

热门问题

热门文章