PySpark错误：StructType无法接受类型<type'int'> - 问答 - Python中文网

PySpark错误：StructType无法接受类型<type'int'>

2024-04-23 12:26:20 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我的数据文件与图形边有关。每行的格式为（src node&dest node）。此id是我的架构定义。 eschema = StructType([StructField("src", StringType(), True), StructField("dst", StringType(), True)]) 我试着读这行，用定界符（'，'）把它分开，然后把每个元素转换成一个int。但不知怎么的，这失败了。你知道吗

 lines = sc.textFile(filename)
 lines = lines.map(lambda l : map(int, l.split(delim)))
 lines = lines.map(lambda l : Row(l[0], l[1]))

运行这个时，我得到了一个错误 StructType can not accept object 0 in type <type 'int'> 我使用的是python2.7，Spark>；2.0。拆分行之后，对象的类型是Unicode而不是string，这会有什么区别吗。如何解决这个问题。任何建议都会大有帮助。谢谢

Tags： lambda src node true 图形 map 数据文件格式

1条回答

网友

1楼 · 发布于 2024-04-23 12:26:20

如果分隔符是“，”则它只是一个常规csv文件。由于您使用的是Spark>；2.0，因此您可以使用现代数据帧api；而不是使用Spark上下文（按惯例sc），您可以使用Spark会话：

df = spark.read.format("csv")\
    .option("header", "true")\ # if you have a header inside the file, otherwise don't put this line
    .option("schema", eschema)\ 
    .load(filename)

除了通过.option("schema", )提供模式之外，还可以使用.option("inferSchema", "true")，它将通过查看数据来猜测文件结构。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章