如何在Spark中连接主节点或解决错误：“WARN TaskSchedulerImpl: 初始作业未接受任何资源”

Question

请告诉我如何解决以下问题。

首先，我确认当主节点是“local”时，以下代码可以正常运行。

然后，我启动了两个EC2实例（m1.large）。但是，当主节点设置为“spark://MASTER_PUBLIC_DNS:7077”时，出现了错误信息“TaskSchedulerImpl”，并且运行失败。

当我把主节点地址改成一个无效的地址（spark://INVALID_DNS:7077）时，依然出现同样的错误信息。

也就是说，“WARN TaskSchedulerImpl: 初始任务没有接受到任何资源；请检查你的集群界面，确保工作节点已注册并且有足够的内存。”

这看起来像是这个。根据这个评论，我给这个集群分配了12G的内存，但还是失败了。

#!/usr/bin/env python                                                                                     
# -*- coding: utf-8 -*- 
from pyspark import SparkContext, SparkConf 
from pyspark.mllib.classification import LogisticRegressionWithSGD 
from pyspark.mllib.regression import LabeledPoint 
from numpy import array 

# Load and parse the data 
def parsePoint(line): 
  values = [float(x) for x in line.split(' ')] 
  return LabeledPoint(values[0], values[1:]) 
appName = "testsparkapp" 
master = "spark://MASTER_PUBLIC_DNS:7077" 
#master = "local" 


conf = SparkConf().setAppName(appName).setMaster(master) 
sc = SparkContext(conf=conf) 

data = sc.textFile("/root/spark/mllib/data/sample_svm_data.txt") 
parsedData = data.map(parsePoint) 

# Build the model 
model = LogisticRegressionWithSGD.train(parsedData) 

# Evaluating the model on training data 
labelsAndPreds = parsedData.map(lambda p: (p.label, model.predict(p.features))) 
trainErr = labelsAndPreds.filter(lambda (v, p): v != p).count() / float(parsedData.count()) 
print("Training Error = " + str(trainErr))

补充说明

我做了三个朋友建议我的任务。

1. 我打开了主节点的7077端口。

2. 在主节点的URL中，设置了主机名而不是IP地址。

->因此，我能够连接到主服务器（我通过集群界面检查了这一点）。

3. 我尝试设置worker_max_heap，像下面这样，但可能还是失败了。

ScalaConf().set("spark.executor.memory", "4g").set("worker_max_heapsize","2g")

工作节点允许我使用6.3GB的内存（我通过界面检查了）。这是m1.large实例。

->我在执行日志中注意到一个警告，并在工作节点的错误输出中看到一个错误。

我的执行日志

14/08/08 06:11:59 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

工作节点错误输出

14/08/08 06:14:04 INFO worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@PRIVATE_HOST_NAME1:52011/user/Worker
14/08/08 06:15:07 ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@PRIVATE_HOST_NAME1:52201] -> [akka.tcp://spark@PRIVATE_HOST_NAME2:38286] disassociated! Shutting down.

error handling spark task scheduling distributed computing resource allocation cluster management ec2 instances memory configuration

如何在Spark中连接主节点或解决错误：“WARN TaskSchedulerImpl: 初始作业未接受任何资源”

补充说明

我的执行日志

工作节点错误输出

1 个回答

撰写回答