使用Hadoop/Spark/Thread在RaspberryPi群集上的Jupyter笔记本中运行PySpark代码时出错

2024-04-25 07:47:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用Yarn cluster使用PySpark运行Jupyter笔记本中的示例代码

我认为我的集群工作得很好。我可以看到所有节点都在运行

yarn node -list -all
OpenJDK Client VM warning: You have loaded library /opt/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Total Nodes:3
         Node-Id             Node-State Node-Http-Address       Number-of-Running-Containers
       pi2:40045                RUNNING          pi2:8042                                  0
       pi1:38067                RUNNING          pi1:8042                                  0
       pi3:35139                RUNNING          pi3:8042                                  0

我正在使用这里提供的示例笔记本:https://github.com/dmanning21h/pi-cluster/blob/master/notebooks/picluster.ipynb

我可以运行除此单元格外的所有笔记本代码:

# Show label counts by class
df.groupBy("label") \
    .count() \
    .orderBy(col("count").desc()) \
    .show()

Spark在此作业中卡住,出现以下错误:

Exception in thread "map-output-dispatcher-0" java.lang.UnsatisfiedLinkError: /opt/hadoop/lib/native/libzstd-jni.so: /opt/hadoop/lib/native/libzstd-jni.so: wrong ELF class: ELFCLASS64 (Possible cause: architecture word width mismatch)
Unsupported OS/arch, cannot find /linux/arm/libzstd-jni.so or load zstd-jni from system libraries. Please try building from source the jar or providing libzstd-jni in your system.
        at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
        at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2442)
        at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2498)
        at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694)
        at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2659)
        at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
        at java.base/java.lang.System.loadLibrary(System.java:1873)
        at com.github.luben.zstd.util.Native.load(Native.java:73)
        at com.github.luben.zstd.util.Native.load(Native.java:60)
        at com.github.luben.zstd.ZstdOutputStream.<clinit>(ZstdOutputStream.java:15)
        at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:224)
        at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:913)
        at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:210)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
        at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:207)
        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:457)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)
Exception in thread "map-output-dispatcher-1" java.lang.NoClassDefFoundError: Could not initialize class com.github.luben.zstd.ZstdOutputStream
        at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:224)
        at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:913)
        at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:210)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
        at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:207)
        at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:457)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:834)

我也尝试过从这个comment解决,但它不起作用


Tags: runorggithublangbaseapacheutiljava