为什么AWS Glue上的SparkXML由于AbstractMethodError而失败?

2024-05-14 06:55:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个用Python编写的AWS Glue作业(通过依赖的jars路径)引入spark xml库。我使用的是spark-xml_2.11-0.2.0.jar。当我试图将数据帧输出到XML时,我遇到了一个错误。我使用的代码是:

applymapping1.toDF().repartition(1).write.format("com.databricks.xml").save("s3://glue.xml.output/Test.xml");

我得到的错误是:

"/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/pyspark.zip/pyspark/sql/readwriter.py", line 550, in save File "/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in call File "/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/mnt/yarn/usercache/root/appcache/application_1517883778506_0016/container_1517883778506_0016_02_000001/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o75.save. : java.lang.AbstractMethodError: com.databricks.spark.xml.DefaultSource15.createRelation(Lorg/apache/spark/sql/SQLContext;Lorg/apache/spark/sql/SaveMode;Lscala/collection/immutable/Map;Lorg/apache/spark/sql/dataset;)Lorg/apache/spark/sql/sources/BaseRelation; at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426) at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215) at

如果我将其更改为CSV,则可以正常工作:

applymapping1.toDF().repartition(1).write.format("com.databricks.csv").save("s3://glue.xml.output/Test.xml");

注意:当使用CSV时,我不必导入spark xml。我认为spark csv包含在AWS Glue的spark环境中。在

有什么建议吗?在

我尝试过spark xml的各种版本:

spark-xml_2.11-0.2.0 spark-xml_2.11-0.3.1 spark-xml_2.10-0.2.0


Tags: sqlapplicationsaveapachecontainerrootxmlzip
1条回答
网友
1楼 · 发布于 2024-05-14 06:55:04

这个问题与Why does elasticsearch-spark 5.5.0 give AbstractMethodError when submitting to YARN cluster?非常相似(但不是完全重复的)Why does elasticsearch-spark 5.5.0 give AbstractMethodError when submitting to YARN cluster?,它也涉及{}。在


引用java.lang.AbstractMethodError的javadoc:

Thrown when an application tries to call an abstract method. Normally, this error is caught by the compiler; this error can only occur at run time if the definition of some class has incompatibly changed since the currently executing method was last compiled.

这很好地解释了您的体验(请注意以“此错误只能在运行时发生”开头的部分)。在

我认为这是一个火花版本的错配。在

给定堆栈跟踪中的com.databricks.spark.xml.DefaultSource15,并且the change执行以下操作:

Remove the separated DefaultSource15 due to compatibility in Spark 1.5+

This removes DefaultSource15 and merge it into DefaultSource. This was separated for compatibility in Spark 1.5+ . In master and spark-xml 0.4.x, it dropped 1.x support.

您应该确保AWS Glue的Spark环境中的Spark版本与Spark xml匹配。spark xml 0.4.1was released on 6 Nov 2016的最新版本。在

相关问题 更多 >

    热门问题