无法在Spark会话中配置GeoSpark:

2024-04-27 10:57:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在尝试使用Spark会话配置geospark,以便在PySpark上使用spatial应用程序。我跟着这个{a1}&;尝试运行下面提到的代码

try:
     import pyspark
     from pyspark import SparkContext, SparkConf
     from pyspark.sql import SparkSession, SQLContext
except ImportError as e:
     raise ImportError('PySpark is not Configured')

print(f"PySpark Version : {pyspark.__version__}")

# Creating a Spark-Context
sc = SparkContext.getOrCreate(SparkConf().setMaster('local[*]').set("spark.ui.port", "4050"))
# Spark Builder
spark = SparkSession.builder.appName('GeoSparkDemo').config('spark.executor.memory', '5GB')\
    .getOrCreate()

from geospark.register import upload_jars
from geospark.register import GeoSparkRegistrator
upload_jars()
GeoSparkRegistrator.registerAll(spark)

当我运行这个文件时,它给出了以下错误

Traceback (most recent call last): File "c:\sourav\spark\code\geospark_demo.py", line 29, in GeoSparkRegistrator.registerAll(spark) File "C:\Users\user3.conda\envs\python37\lib\site-packages\geospark\register\geo_registrator.py", line 26, in registerAll cls.register(spark) File "C:\Users\user3.conda\envs\python37\lib\site-packages\geospark\register\geo_registrator.py", line 31, in register return spark._jvm.GeoSparkSQLRegistrator.registerAll(spark._jsparkSession) TypeError: 'JavaPackage' object is not callable

我试图在spark jars文件夹中手动添加以下jar文件

•geospark-1.3.1.jar •geospark-sql_2.1-1.3.1.jar •geo_wrapper.jar

现在,先前的错误消失了&;新异常正在引发,如下所示:

Traceback (most recent call last): File "c:\sourav\spark\code\geospark_demo.py", line 29, in GeoSparkRegistrator.registerAll(spark) File "C:\Users\user3.conda\envs\python37\lib\site-packages\geospark\register\geo_registrator.py", line 26, in registerAll cls.register(spark) File "C:\Users\user3.conda\envs\python37\lib\site-packages\geospark\register\geo_registrator.py", line 31, in register return spark._jvm.GeoSparkSQLRegistrator.registerAll(spark._jsparkSession)
File "C:\Users\user3.conda\envs\python37\lib\site-packages\py4j\java_gateway.py", line 1257, in call answer, self.gateway_client, self.target_id, self.name) File "C:\sourav\spark\spark-2.4.7-bin-hadoop2.7\python\pyspark\sql\utils.py", line 63, in deco return f(*a, **kw) File "C:\Users\user3.conda\envs\python37\lib\site-packages\py4j\protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling z:org.datasyslab.geosparksql.utils.GeoSparkSQLRegistrator.registerAll. : java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.registerFunction(Ljava/lang/String;Lscala/Function1;)V at org.datasyslab.geosparksql.UDF.UdfRegistrator$$anonfun$registerAll$1.apply(UdfRegistrator.scala:29) at org.datasyslab.geosparksql.UDF.UdfRegistrator$$anonfun$registerAll$1.apply(UdfRegistrator.scala:29) at scala.collection.immutable.List.foreach(List.scala:392) at org.datasyslab.geosparksql.UDF.UdfRegistrator$.registerAll(UdfRegistrator.scala:29) at org.datasyslab.geosparksql.utils.GeoSparkSQLRegistrator$.registerAll(GeoSparkSQLRegistrator.scala:34) at org.datasyslab.geosparksql.utils.GeoSparkSQLRegistrator.registerAll(GeoSparkSQLRegistrator.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Unknown Source)

我发现这个link有类似的问题,我甚至尝试用下面的代码在spark配置文件中添加JAR,但似乎没有任何效果

spark.driver.extraClassPath C:\sourav\spark\geosparkjar/*

我正在使用Geospark 1.3.1、Java 8、Python 3.7、Apache Spark 2.4.7,我的Java_主页、Spark_主页设置正确,我在windows 10上运行

如何解决此问题以继续?如有任何帮助/建议,将不胜感激


Tags: inpyorgregisterlinejavausersat
1条回答
网友
1楼 · 发布于 2024-04-27 10:57:52

GeoSpark目前以Apache Sedona的形式提供

对于类似的用例,我遵循以下说明:

pip安装apachesedona

from pyspark.sql import SparkSession
from sedona.utils.adapter import Adapter
from sedona.register import SedonaRegistrator
from sedona.utils import KryoSerializer, SedonaKryoRegistrator
spark = SparkSession.builder.master("spark://test:7077").appName("sedonatest").
    config("spark.serializer", KryoSerializer.getName). \
    config("spark.kryo.registrator", SedonaKryoRegistrator.getName). \
    config('spark.jars.packages',
           'org.apache.sedona:sedona-python-adapter-3.0_2.12:1.0.0-incubating,'
           'org.datasyslab:geotools-wrapper:geotools-24.0').getOrCreate()
SedonaRegistrator.registerAll(spark)

resultsDF = spark.sql("SELECT ST_PolygonFromText('-74.0428197,40.6867969,-74.0421975,40.6921336,-74.0508020,40.6912794,-74.0428197,40.6867969', ',') AS polygonshape")

注意:在spark提交过程中,使用Jars选项通过以下2个Jars:

相关问题 更多 >