在Hadoop上运行Python代码失败

6 投票
3 回答
7653 浏览
提问于 2025-04-17 18:46

我试着按照这个页面上的说明去做:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/

$bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -input /user/root/wordcountpythontxt -output /user/root/wordcountpythontxt-output -mapper /user/root/wordcountpython/mapper.py -reducer /user/root/wordcountpython/reducer.py -file /user/root/mapper.py -file /user/root/reducer.py

上面说了

File: /user/root/mapper.py does not exist, or is not readable.
Streaming Command Fail

当我浏览这个网址:jobdetails.jsp/

我发现了很多错误信息

java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    ... 14 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 17 more
Caused by: java.lang.RuntimeException: configuration exception
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
    at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
    ... 22 more
Caused by: java.io.IOException: Cannot run program "/user/root/wordcountpython/mapper.py": error=2, No such file or directory
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
    at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
    ... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
    at java.lang.UNIXProcess.forkAndExec(Native Method)
    at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
    at java.lang.ProcessImpl.start(ProcessImpl.java:91)
    at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
    ... 24 more

我不知道怎么解决这些问题,请帮我运行这个Python程序。

3 个回答

1

看起来问题出在这一行。

Caused by: java.io.IOException: Cannot run program "/user/root/wordcountpython/mapper.py": error=2, No such file or directory

请你检查一下文件 /user/root/wordcountpython/mapper.py 是否存在。如果存在的话,看看这个文件的权限是什么。

你用来运行hadoop的用户有没有权限去执行和读取这个文件呢?

3

你可能需要检查一下在mapper.py文件中,#!这一行后面是否有类似Windows系统的换行符。如果有的话,hadoop可能找不到你的Python解释器,因为它会看到多出来的换行符。例如,它会显示成/usr/local/bin/python^M,而不是/usr/local/bin/python,其中的^M就是换行符。你可以试试在你的mapper和reducer文件上使用dos2unix命令。

3

如果你仔细查看了这个链接上的说明,

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output

你会发现其实不需要把mapper.py和reducer.py这两个文件复制到HDFS(分布式文件系统)里,你可以直接从本地文件系统链接这两个文件,比如用/path/to/mapper。这样的话,你就可以避免上面提到的错误了。

撰写回答