由于imp,Pyspark程序无法在OOZIE中运行

2024-04-19 10:23:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在Oozie中运行一个简单的PySpark程序,但未能成功运行。我有一个简单的PySpark程序,它在RDD中加载HDFS上的文件。我把RDD转换成一个数据帧,然后把它转换成熊猫。我在导入熊猫时出错了。 我在clouderavm版本5.4.2上安装了anaconda python发行版。我已经将Anaconda dir(/home/cloudera/anaconda/bin/)添加到我的系统CLASSPATH。以下是echo $PATH语句的输出:

/home/cloudera/anaconda/bin:/home/cloudera/anaconda/bin:/usr/local/firefox:/sbin:/usr/java/jdk1.7.0_67-cloudera/bin:/usr/local/apache-ant/apache-ant-1.9.2/bin:/usr/local/apache-maven/apache-maven-3.0.4/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cloudera/bin

这个程序通过CLI运行得很好,但是当我试图通过OOZIE作业运行它时,它失败了。在

以下是我得到的错误日志:

Stdoutput /usr/lib/spark/python/pyspark/sql/context.py:156: UserWarning: Using RDD of dict to inferSchema is deprecated,please use pyspark.sql.Row instead
Stdoutput warnings.warn("Using RDD of dict to inferSchema is deprecated," Stdoutput Traceback (most recent call last):
Stdoutput File "/home/cloudera/Dataframe/apps/shell/lib/test.py", line 48, in
Stdoutput cleand = inputRDD.toPandas()
Stdoutput File "/usr/lib/spark/python/pyspark/sql/dataframe.py", line 717, in toPandas
Stdoutput import pandas as pd
Stdoutput ImportError: No module named pandas
Exit code of the Shell command 1
<<< Invocation of Shell command completed <<<
<<< Invocation of Main class completed <<<
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]
Oozie Launcher failed, finishing Hadoop job gracefully
Oozie Launcher, uploading action data to HDFS sequence file: hdfs://quickstart.cloudera:8020/user/cloudera/oozie-oozi/0000000-150724190306010-oozie-oozi-W/shell-node--shell/action-data.seq
Oozie Launcher ends

有谁能帮我成功运行程序吗?在

谢谢。在


Tags: of程序homebinlibusrapachelocal