在CDH虚拟机上找不到python mrjob moduel

2024-04-19 01:04:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Mrjob在Hadoop中运行python代码。我在一个单节点集群上使用一个带有虚拟机的CDH包。当我在本地测试代码时,mrjob运行正常,但是当我在Hadoop集群上运行时,它抛出一个错误:

No module named mrjob

当我删除python之前的“sudo”命令时,我得到了以下消息。在

no configs found; falling back on auto-configuration
no configs found; falling back on auto-configuration
creating tmp directory /tmp/main_mrjob.cloudera.20131022.180113.820659
writing wrapper script to /tmp/main_mrjob.cloudera.20131022.180113.820659/setup-wrapper.sh
STDERR: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
STDERR: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.util.PlatformName
STDERR:     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
STDERR:     at java.security.AccessController.doPrivileged(Native Method)
STDERR:     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
STDERR:     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
STDERR:     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
STDERR:     at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
STDERR: Could not find the main class: org.apache.hadoop.util.PlatformName.  Program will exit.
STDERR: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FsShell
STDERR: Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FsShell
STDERR:     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
STDERR:     at java.security.AccessController.doPrivileged(Native Method)
STDERR:     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
STDERR:     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
STDERR:     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
STDERR:     at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
STDERR: Could not find the main class: org.apache.hadoop.fs.FsShell.  Program will exit.
Traceback (most recent call last):
  File "main_mrjob.py", line 17, in <module>
    MRWordFrequencyCount.run()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/job.py", line 500, in run
    mr_job.execute()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/job.py", line 518, in execute
    super(MRJob, self).execute()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/launch.py", line 146, in execute
    self.run_job()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/launch.py", line 207, in run_job
    runner.run()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/runner.py", line 458, in run
    self._run()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/hadoop.py", line 236, in _run
    self._upload_local_files_to_hdfs()
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/hadoop.py", line 263, in _upload_local_files_to_hdfs
    self._mkdir_on_hdfs(self._upload_mgr.prefix)
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/hadoop.py", line 271, in _mkdir_on_hdfs
    self.invoke_hadoop(['fs', '-mkdir', path])
  File "/home/cloudera/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/mrjob/fs/hadoop.py", line 104, in invoke_hadoop
    raise CalledProcessError(proc.returncode, args)
subprocess.CalledProcessError: Command '['/usr/lib/hadoop-0.20-mapreduce/bin/hadoop', 'fs', '-mkdir', 'hdfs:///user/cloudera/tmp/mrjob/main_mrjob.cloudera.20131022.180113.820659/files/']' returned non-zero exit status 1

似乎没有sudo就不能在hdfs上“mkdir”,但是有了sudo它就找不到mrjob。我真的很困惑。。。。在

非常感谢!!在


Tags: runinpyhadoophomelibstderrline
1条回答
网友
1楼 · 发布于 2024-04-19 01:04:14

在使用clouderaquickstartvm时,我也遇到了同样的问题。在

解决办法是:

  1. 将HADOOP_HOME设置为“/usr/lib/HADOOP”:

    export HADOOP_HOME=/usr/lib/hadoop
    
  2. 创建指向hadoop的符号链接-流媒体.jar公司名称:

    sudo ln -s /usr/lib/hadoop-mapreduce/hadoop-streaming.jar /usr/lib/hadoop
    

相关问题 更多 >