Hadoop集群 - 我需要在所有机器上复制代码才能运行任务吗?

3 投票
1 回答
1937 浏览
提问于 2025-04-17 04:58

这让我感到困惑,当我使用字数统计的例子时,我把代码放在主节点上,让从节点去处理事情,这样运行得很好。

但是当我运行我的代码时,从节点开始出错,出现一些奇怪的错误,比如:

Traceback (most recent call last):
  File "/app/hadoop/tmp/mapred/local/taskTracker/hduser/jobcache/job_201110250901_0005/attempt_201110250901_0005_m_000001_1/work/./mapper.py", line 55, in <module>
    from src.utilities import utilities
ImportError: No module named src.utilities
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
        at org.apache.hadoop.streaming.PipeMapper.map(PipeMapper.java:121)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
        at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
        at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
        at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
        at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
        at org.apache.hadoop.mapred.Child.main(Child.java:255)  

因为我在路径上没有代码,是不是我做错了什么?

谢谢你。

1 个回答

6

在使用Hadoop Streaming的时候,如果目标机器上没有你的代码或依赖文件,就需要通过 -file 这个选项把它们复制过去。确保在Hadoop Streaming的命令中指定好你的map和reduce文件以及它们的依赖文件。

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper myPythonScript.py \
    -reducer /bin/wc \
    -file myPythonScript.py \
    -file myDictionary.txt \

撰写回答