我正在尝试使用Yarn cluster使用PySpark运行Jupyter笔记本中的示例代码
我认为我的集群工作得很好。我可以看到所有节点都在运行
yarn node -list -all
OpenJDK Client VM warning: You have loaded library /opt/hadoop/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
pi2:40045 RUNNING pi2:8042 0
pi1:38067 RUNNING pi1:8042 0
pi3:35139 RUNNING pi3:8042 0
我正在使用这里提供的示例笔记本:https://github.com/dmanning21h/pi-cluster/blob/master/notebooks/picluster.ipynb
我可以运行除此单元格外的所有笔记本代码:
# Show label counts by class
df.groupBy("label") \
.count() \
.orderBy(col("count").desc()) \
.show()
Spark在此作业中卡住,出现以下错误:
Exception in thread "map-output-dispatcher-0" java.lang.UnsatisfiedLinkError: /opt/hadoop/lib/native/libzstd-jni.so: /opt/hadoop/lib/native/libzstd-jni.so: wrong ELF class: ELFCLASS64 (Possible cause: architecture word width mismatch)
Unsupported OS/arch, cannot find /linux/arm/libzstd-jni.so or load zstd-jni from system libraries. Please try building from source the jar or providing libzstd-jni in your system.
at java.base/java.lang.ClassLoader$NativeLibrary.load0(Native Method)
at java.base/java.lang.ClassLoader$NativeLibrary.load(ClassLoader.java:2442)
at java.base/java.lang.ClassLoader$NativeLibrary.loadLibrary(ClassLoader.java:2498)
at java.base/java.lang.ClassLoader.loadLibrary0(ClassLoader.java:2694)
at java.base/java.lang.ClassLoader.loadLibrary(ClassLoader.java:2659)
at java.base/java.lang.Runtime.loadLibrary0(Runtime.java:830)
at java.base/java.lang.System.loadLibrary(System.java:1873)
at com.github.luben.zstd.util.Native.load(Native.java:73)
at com.github.luben.zstd.util.Native.load(Native.java:60)
at com.github.luben.zstd.ZstdOutputStream.<clinit>(ZstdOutputStream.java:15)
at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:224)
at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:913)
at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:210)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:207)
at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Exception in thread "map-output-dispatcher-1" java.lang.NoClassDefFoundError: Could not initialize class com.github.luben.zstd.ZstdOutputStream
at org.apache.spark.io.ZStdCompressionCodec.compressedOutputStream(CompressionCodec.scala:224)
at org.apache.spark.MapOutputTracker$.serializeMapStatuses(MapOutputTracker.scala:913)
at org.apache.spark.ShuffleStatus.$anonfun$serializedMapStatus$2(MapOutputTracker.scala:210)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.ShuffleStatus.withWriteLock(MapOutputTracker.scala:72)
at org.apache.spark.ShuffleStatus.serializedMapStatus(MapOutputTracker.scala:207)
at org.apache.spark.MapOutputTrackerMaster$MessageLoop.run(MapOutputTracker.scala:457)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
我也尝试过从这个comment解决,但它不起作用
目前没有回答
相关问题 更多 >
编程相关推荐