Pypark运动

2024-04-28 11:52:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试执行spark文件夹中给定的kinesis_wordcount_asl.py文件,以便使用kinesis(https://spark.apache.org/docs/2.4.3/streaming-kinesis-integration.html)进行spark流式处理。我的Spark版本是2.4.3,Scala版本是2.11.12。基于此,我尝试使用以下所有命令执行它:

bin\spark-submit --jars jars\spark-streaming-kinesis-asl_2.11-2.4.3.jar external\kinesis-asl\src\main\python\examples\streaming\kinesis_wordcount_asl.py streamingapp kinesisstreamname https://kinesis.us-east-1.amazonaws.com us-east-1

bin\spark-submit --jars jars\spark-streaming-kinesis-asl-assembly_2.11-2.0.0.jar external\kinesis-asl\src\main\python\examples\streaming\kinesis_wordcount_asl.py streamingapp kinesisstreamname https://kinesis.us-east-1.amazonaws.com us-east-1

bin\spark-submit --packages org.apache.spark:spark-streaming-kinesis-asl_2.11:2.4.3 external\kinesis-asl\src\main\python\examples\streaming\kinesis_wordcount_asl.py streamingapp kinesisstreamname https://kinesis.us-east-1.amazonaws.com us-east-1

通过以上所有组合,我得到以下错误:

21/06/22 19:05:34 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/06/22 19:05:40 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
Traceback (most recent call last):
  File "C:/Spark/external/kinesis-asl/src/main/python/examples/streaming/kinesis_wordcount_asl.py", line 75, in <module>
    ssc, appName, streamName, endpointUrl, regionName, InitialPositionInStream.LATEST, 2)
  File "C:\Spark\python\lib\pyspark.zip\pyspark\streaming\kinesis.py", line 92, in createStream
  File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__
  File "C:\Spark\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py", line 332, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o26.createStream. Trace:
py4j.Py4JException: Method createStream([class org.apache.spark.streaming.api.java.JavaStreamingContext, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.String, class java.lang.Integer, class org.apache.spark.streaming.Duration, class org.apache.spark.storage.StorageLevel, null, null, null, null, null]) does not exist
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:318)
        at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:326)
        at py4j.Gateway.invoke(Gateway.java:274)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:238)
        at java.lang.Thread.run(Thread.java:748)

我也尝试过用Jupyter笔记本执行它,但是得到了相同的错误。我已经读到,造成这种情况的主要原因是版本不匹配,但我也检查了这一点,并且我的Spark/Scala版本与我正在使用的JAR相匹配。在这方面的任何帮助都是非常感谢的。谢谢


Tags: pysrclangjavawordcountatclasskinesis