在Hadoop上运行Python代码失败
我试着按照这个页面上的说明去做:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
$bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -input /user/root/wordcountpythontxt -output /user/root/wordcountpythontxt-output -mapper /user/root/wordcountpython/mapper.py -reducer /user/root/wordcountpython/reducer.py -file /user/root/mapper.py -file /user/root/reducer.py
上面说了
File: /user/root/mapper.py does not exist, or is not readable.
Streaming Command Fail
当我浏览这个网址:jobdetails.jsp/
我发现了很多错误信息
java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 9 more
Caused by: java.lang.RuntimeException: Error in configuring object
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
... 14 more
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 17 more
Caused by: java.lang.RuntimeException: configuration exception
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230)
at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66)
... 22 more
Caused by: java.io.IOException: Cannot run program "/user/root/wordcountpython/mapper.py": error=2, No such file or directory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214)
... 23 more
Caused by: java.io.IOException: error=2, No such file or directory
at java.lang.UNIXProcess.forkAndExec(Native Method)
at java.lang.UNIXProcess.<init>(UNIXProcess.java:53)
at java.lang.ProcessImpl.start(ProcessImpl.java:91)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 24 more
我不知道怎么解决这些问题,请帮我运行这个Python程序。
3 个回答
1
看起来问题出在这一行。
Caused by: java.io.IOException: Cannot run program "/user/root/wordcountpython/mapper.py": error=2, No such file or directory
请你检查一下文件 /user/root/wordcountpython/mapper.py
是否存在。如果存在的话,看看这个文件的权限是什么。
你用来运行hadoop的用户有没有权限去执行和读取这个文件呢?
3
你可能需要检查一下在mapper.py文件中,#!这一行后面是否有类似Windows系统的换行符。如果有的话,hadoop可能找不到你的Python解释器,因为它会看到多出来的换行符。例如,它会显示成/usr/local/bin/python^M,而不是/usr/local/bin/python,其中的^M就是换行符。你可以试试在你的mapper和reducer文件上使用dos2unix命令。
3
如果你仔细查看了这个链接上的说明,
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -file /home/hduser/mapper.py -mapper /home/hduser/mapper.py -file /home/hduser/reducer.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output
你会发现其实不需要把mapper.py和reducer.py这两个文件复制到HDFS(分布式文件系统)里,你可以直接从本地文件系统链接这两个文件,比如用/path/to/mapper。这样的话,你就可以避免上面提到的错误了。