有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何使用主类和用户获取Spark应用程序ID

我正在Cloudera中使用Hadoop 2.0,并尝试一个Java程序,该程序将检查特定的Spark applicationID是否正在运行,并采取进一步的步骤

我有触发特定Spark应用程序的主类和用户(帐户)信息

我知道下面的内容,但是否有任何Java API可以帮助使用主类名和用户名进行解析和匹配

yarn application -list

我们是否可以通过在另一个Java程序中使用主类和用户信息来获取正在运行的spark应用程序的applicationID

[编辑]一种方法是发出以下命令:

yarn application -list -appStates RUNNING | grep $application_name | grep $user | cut -f 1

如果有任何Java API可以简化,请分享

上面的[EDIT]命令很好,但我使用YanClient进行了如下尝试:

公共类支票{

    public boolean run(String account, String appName) throws YarnException, IOException {




        SparkContext sc = new SparkContext(new SparkConf().setMaster("yarn").setAppName("SomeCheck"));
        YarnConfiguration conf = new YarnConfiguration(SparkHadoopUtil.get().newConfiguration(sc.getConf()));

        YarnClient yarnClient = YarnClient.createYarnClient();
        yarnClient.init(conf);
        yarnClient.start();
        EnumSet<YarnApplicationState> states =
                  EnumSet.of(YarnApplicationState.ACCEPTED, YarnApplicationState.RUNNING);

        List<ApplicationReport> applications = yarnClient.getApplications(states);


        for (ApplicationReport application : applications) {
               if ((application.getUser() == account) & (application.getName() == appName)) return true;
        }

        return false;

    }

}

它指向SparkContext sc = new SparkContext(new SparkConf().setMaster("yarn").setAppName("SomeCheck"));失败

错误:

 ERROR spark.SparkContext: Error initializing SparkContext.
com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'

代码中有什么不正确的地方吗


共 (2) 个答案

  1. # 1 楼答案

    Vijay建议的上述方法适用于当前运行的应用程序

    但似乎你的要求是获得所有的应用程序


    Question : is there any Java API that can help parse and match using Main class name and user name ?

    请参阅hadoop文档YarnClient ...here

    基本上,YarnClientgetApplications将获得所有应用程序

    abstract List getApplications(EnumSet applicationStates) Get a report (ApplicationReport) of Applications matching the given application states in the cluster.

    您可以尝试这样的方法,定期打印所有应用程序

    import org.apache.hadoop.yarn.client.api.YarnClient
    public class YarnMonitor {
        public static void main(String [] args) throws Exception{
            SparkContext sc = new SparkContext(new SparkConf().setMaster("yarn").setAppName("Yarn Monitor"));
    
            YarnClient yarnClient = YarnClient.createYarnClient();
            YarnConfiguration yarnConf = new YarnConfiguration(SparkHadoopUtil.get().newConfiguration(sc.getConf()));
    
            while(true){ // periodically loop and get currently running apps
    
                yarnClient = YarnClient.createYarnClient();
                List<ApplicationReport> applications = yarnClient.getApplications();
    
                for (ApplicationReport application : applications) {
                    System.out.println(application.getName());
                }
                Thread.sleep(1000); // sleep for 1000 ms
            }
        }
    
  2. # 2 楼答案

    您可以在当前spark应用程序代码本身中获取应用程序id

    下面是示例(Scala)代码片段,java也有相同的api

    // create spark configuration
    SparkConf conf = new SparkConf().setMaster("local");
    conf.set("spark.app.name", "test");
    
    // create a spark context
    SparkContext sc = new SparkContext(conf);
    
    // get the application id
    String appId = sc.applicationId();
    
    // print the application id
    System.out.println("Application id:  " + appId);
    
    // stop the spark context
    sc.stop();
    

    请试试这个