线程“Driver”中出现异常org.apache.spark网站.SparkUserAppException:用户应用程序已退出,1

2024-06-16 09:27:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试在Bluemix上运行hello world风格的python spark应用程序:

from __future__ import print_function

import sys
from operator import add
from pyspark import SparkContext

if __name__ == "__main__":
    sc = SparkContext(appName="PythonWordCount")
    lines = sc.textFile(sys.argv[1], 1)
    counts = lines.flatMap(lambda x: x.split(' ')) \
                  .map(lambda x: (x, 1)) \
                  .reduceByKey(add)
    output = counts.collect()
    for (word, count) in output:
        print("%s: %i" % (word, count))

    sc.stop()

我运行的命令是:

^{pr2}$

许可证文件只是标准的apache2.0许可证。在

但是,我得到一个错误:

snowch$ cat stderr_1463086843N
Spark Command: /usr/local/src/spark160master/ibm-java-x86_64-80/bin/java -cp /usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/ego/spark-launcher_2.10-1.6.0.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/ego/spark-network-shuffle_2.10-1.6.0.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/ego/gson-2.2.4.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/ego/guava-14.0.1.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/ego/Java-WebSocket-1.3.0.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/ego/spark-ego_2.10-1.6.0.jar:/usr/local/src/spark160master/spark/profile/batch/:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/lib/spark-assembly-1.6.0-hadoop2.6.0.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/lib/datanucleus-api-jdo-3.2.6.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/lib/datanucleus-core-3.2.10.jar:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/lib/datanucleus-rdbms-3.2.9.jar:/usr/local/src/data-connectors-1.4.1/*:/usr/local/src/analytic-libs/spark-1.6.0/* -Dspark.service.plan_name=ibm.SparkService.PayGoPersonalInteractive -Dspark.eventLog.enabled=true -Dspark.files=/gpfs/fs01/user/XXXXXX/data/YYYYYY/LICENSE,/gpfs/fs01/user/XXXXXX/data/YYYYYY/wordcount.py -Dspark.driver.extraClassPath=/gpfs/fs01/user/XXXXXX/data/libs/*: -Dspark.eventLog.dir=/gpfs/fs01/user/XXXXXX/events -Dspark.service.hashed_tenant_id=9kowbr9dfmU1t/2Hi9NcNo8gscOc+1oEHPakfA== -Dspark.app.name=wordcount.py -Dspark.executor.memory=1024m -Dspark.driver.extraLibraryPath=/gpfs/fs01/user/XXXXXX/data/libs/*: -Dspark.service.spark_version=1.6.0 -Dspark.executor.extraLibraryPath=/gpfs/fs01/user/XXXXXX/data/libs/*: -Dspark.master=spark://yp-spark-dal09-env5-0018:7083 -Dspark.executor.extraClassPath=/gpfs/fs01/user/XXXXXX/data/libs/*: -Dspark.files.useFetchCache=false -Dspark.shuffle.service.port=7340 -Xms512m -Xmx512m org.apache.spark.deploy.ego.EGOClusterDriverWrapper {{WORKER_URL}} /gpfs/fs01/user/XXXXXX/data/YYYYYY/wordcount.py org.apache.spark.deploy.PythonRunner --primary-py-file wordcount.py LICENSE
========================================
log4j:ERROR Could not find value for key log4j.appender.FILE
log4j:ERROR Could not instantiate appender named "FILE".
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/src/spark160master/spark-1.6.0-bin-2.6.0/lib/spark-assembly-1.6.0-hadoop2.6.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/analytic-libs/spark-1.6.0/tika-app-1.11.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/src/analytic-libs/spark-1.6.0/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/05/12 16:00:51 INFO deploy.ego.EGOClusterDriverWrapper: Registered signal handlers for [TERM, HUP, INT]
16/05/12 16:00:52 WARN hadoop.util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/12 16:00:52 INFO apache.spark.SecurityManager: Changing view acls to: XXXXXX
16/05/12 16:00:52 INFO apache.spark.SecurityManager: Changing modify acls to: XXXXXX
16/05/12 16:00:52 INFO apache.spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(XXXXXX); users with modify permissions: Set(XXXXXX)
16/05/12 16:00:52 INFO spark.util.Utils: Successfully started service 'EGOClusterDriverWrapper-driver-20160512160048-0013-e37762f0-0cfc-4763-ac2d-ac829f2612c1' on port 48576.
16/05/12 16:00:53 INFO apache.spark.SecurityManager: Changing view acls to: XXXXXX
16/05/12 16:00:53 INFO apache.spark.SecurityManager: Changing modify acls to: XXXXXX
16/05/12 16:00:53 INFO apache.spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(XXXXXX); users with modify permissions: Set(XXXXXX)
16/05/12 16:00:53 INFO deploy.ego.EGOClusterDriverWrapper: Fetching file from /gpfs/fs01/user/XXXXXX/data/YYYYYY/LICENSE to /gpfs/fs01/user/XXXXXX/data/workdir/AAAAAA/LICENSE
16/05/12 16:00:53 INFO spark.util.Utils: Copying /gpfs/fs01/user/XXXXXX/data/YYYYYY/LICENSE to /gpfs/fs01/user/XXXXXX/data/workdir/AAAAAA/LICENSE
16/05/12 16:00:53 INFO deploy.ego.EGOClusterDriverWrapper: Fetching file from /gpfs/fs01/user/XXXXXX/data/YYYYYY/wordcount.py to /gpfs/fs01/user/XXXXXX/data/workdir/AAAAAA/wordcount.py
16/05/12 16:00:53 INFO spark.util.Utils: Copying /gpfs/fs01/user/XXXXXX/data/YYYYYY/wordcount.py to /gpfs/fs01/user/XXXXXX/data/workdir/AAAAAA/wordcount.py
16/05/12 16:00:53 INFO deploy.ego.EGOClusterDriverWrapper: Starting the user JAR in a separate Thread
16/05/12 16:00:53 INFO deploy.ego.EGOClusterDriverWrapper: Waiting for spark context initialization ... 0
16/05/12 16:00:54 INFO apache.spark.SparkContext: Running Spark version 1.6.0
16/05/12 16:00:54 INFO apache.spark.SecurityManager: Changing view acls to: XXXXXX
16/05/12 16:00:54 INFO apache.spark.SecurityManager: Changing modify acls to: XXXXXX
16/05/12 16:00:54 INFO apache.spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(XXXXXX); users with modify permissions: Set(XXXXXX)
16/05/12 16:00:54 INFO spark.util.Utils: Successfully started service 'sparkDriver' on port 45352.
16/05/12 16:00:54 INFO apache.spark.SparkEnv: The address of rpcenv is :10.142.18.197:45352
16/05/12 16:00:54 INFO event.slf4j.Slf4jLogger: Slf4jLogger started
16/05/12 16:00:54 INFO Remoting: Starting remoting
16/05/12 16:00:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.142.18.197:46562]
16/05/12 16:00:54 INFO spark.util.Utils: Successfully started service 'sparkDriverActorSystem' on port 46562.
16/05/12 16:00:54 INFO apache.spark.SparkEnv: Registering MapOutputTracker
16/05/12 16:00:54 INFO apache.spark.SparkEnv: Registering BlockManagerMaster
16/05/12 16:00:54 INFO spark.storage.DiskBlockManager: Created local directory at /gpfs/global_fs01/sym_shared/YPProdSpark/user/XXXXXX/data/workdir/EEEEEE
16/05/12 16:00:54 INFO spark.storage.MemoryStore: MemoryStore started with capacity 159.0 MB
16/05/12 16:00:54 INFO apache.spark.SparkEnv: Registering OutputCommitCoordinator
16/05/12 16:00:54 INFO jetty.server.Server: jetty-8.y.z-SNAPSHOT
16/05/12 16:00:54 INFO jetty.server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:53308
16/05/12 16:00:54 INFO spark.util.Utils: Successfully started service 'SparkUI' on port 53308.
16/05/12 16:00:54 INFO spark.ui.SparkUI: Started SparkUI at http://10.142.18.197:53308
16/05/12 16:00:54 INFO apache.spark.HttpFileServer: HTTP File server directory is /gpfs/global_fs01/sym_shared/YPProdSpark/user/XXXXXX/data/workdir/CCCCCC/DDDDDD
16/05/12 16:00:54 INFO apache.spark.HttpServer: Starting HTTP Server
16/05/12 16:00:54 INFO jetty.server.Server: jetty-8.y.z-SNAPSHOT
16/05/12 16:00:54 INFO jetty.server.AbstractConnector: Started SocketConnector@0.0.0.0:50862
16/05/12 16:00:54 INFO spark.util.Utils: Successfully started service 'HTTP file server' on port 50862.
16/05/12 16:00:54 INFO spark.util.Utils: Copying /gpfs/fs01/user/XXXXXX/data/YYYYYY/LICENSE to /gpfs/global_fs01/sym_shared/YPProdSpark/user/XXXXXX/data/workdir/CCCCCC/BBBBBB/LICENSE
16/05/12 16:00:54 INFO apache.spark.SparkContext: Added file /gpfs/fs01/user/XXXXXX/data/YYYYYY/LICENSE at http://10.142.18.197:50862/files/LICENSE with timestamp 1463086854983
16/05/12 16:00:54 INFO spark.util.Utils: Copying /gpfs/fs01/user/XXXXXX/data/YYYYYY/wordcount.py to /gpfs/global_fs01/sym_shared/YPProdSpark/user/XXXXXX/data/workdir/CCCCCC/BBBBBB/wordcount.py
16/05/12 16:00:54 INFO apache.spark.SparkContext: Added file /gpfs/fs01/user/XXXXXX/data/YYYYYY/wordcount.py at http://10.142.18.197:50862/files/wordcount.py with timestamp 1463086854987
16/05/12 16:00:55 INFO spark.util.EGOSparkDockerConfig: Docker not enabled
16/05/12 16:00:55 INFO cluster.ego.EGOFineGrainedSchedulerBackend: setting reserve=0, priority=1, limit=2147483647,  master=spark://yp-spark-dal09-env5-0018:7083
16/05/12 16:00:55 INFO client.ego.EGOAppClient$ClientEndpoint: Connecting to master spark://yp-spark-dal09-env5-0018:7083...
16/05/12 16:00:55 INFO cluster.ego.EGOFineGrainedSchedulerBackend: Connected to Spark cluster with app ID ZZZZZZ
16/05/12 16:00:55 INFO cluster.ego.EGOFineGrainedSchedulerBackend: Application registered successfully as ZZZZZZ
16/05/12 16:00:55 INFO spark.util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40418.
16/05/12 16:00:55 INFO network.netty.NettyBlockTransferService: Server created on 40418
16/05/12 16:00:55 INFO spark.storage.BlockManagerMaster: Trying to register BlockManager
16/05/12 16:00:55 INFO spark.storage.BlockManagerMasterEndpoint: Registering block manager 10.142.18.197:40418 with 159.0 MB RAM, BlockManagerId(driver, 10.142.18.197, 40418)
16/05/12 16:00:55 INFO spark.storage.BlockManagerMaster: Registered BlockManager
16/05/12 16:00:55 INFO spark.scheduler.EventLoggingListener: Logging events to file:/gpfs/fs01/user/XXXXXX/events/ZZZZZZ
16/05/12 16:00:55 INFO cluster.ego.EGODeployScheduler: Spark context initialized.
16/05/12 16:00:55 INFO spark.storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 225.5 KB, free 225.5 KB)
16/05/12 16:00:55 INFO spark.storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 19.5 KB, free 245.0 KB)
16/05/12 16:00:55 INFO spark.storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.142.18.197:40418 (size: 19.5 KB, free: 159.0 MB)
16/05/12 16:00:55 INFO apache.spark.SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2
16/05/12 16:00:55 INFO deploy.ego.EGOClusterDriverWrapper: Final app status: 1, exitCode: 15, (reason: User class threw exception: User application exited with 1)
16/05/12 16:00:55 INFO deploy.ego.EGOClusterDriverWrapper: Sending driver program state to master
Exception in thread "Driver" org.apache.spark.SparkUserAppException: User application exited with 1
    at org.apache.spark.deploy.PythonRunner$.main(PythonRunner.scala:88)
    at org.apache.spark.deploy.PythonRunner.main(PythonRunner.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
    at java.lang.reflect.Method.invoke(Method.java:507)
    at org.apache.spark.deploy.ego.EGOClusterDriverWrapper$$anon$3.run(EGOClusterDriverWrapper.scala:430)
    ...

有什么想法吗?在


Tags: toinfosrcdatausrapachelocalspark
1条回答
网友
1楼 · 发布于 2024-06-16 09:27:31

如果作业参数是输入文件(例如:如果本地文件是./LICENSE),则需要执行以下操作:

  1. 将本地文件路径包含到选项列表 files

  2. 将标记file://作为输入文件参数的前缀。

在您的情况下,它将如下所示:

./spark-submit.sh  vcap ./vcap.json  deploy-mode cluster  master \
              https://x.x.x.x:8443  files ./LICENSE wordcount.py file://LICENSE

相关问题 更多 >