Pig Hadoop Stream 帮助
我在运行Pig Streaming的时候遇到了一些问题。当我在一个机器上启动一个交互式的Pig实例时(顺便说一下,我是在通过SSH/Putty连接到一个AWS EMR的主节点上做这个),我的Pig Streaming工作得非常顺利(在我的Windows Cloudera虚拟机上也能正常工作)。但是,当我切换到使用多台计算机时,它就停止工作了,并且出现了各种错误。
需要注意的是:
- 我可以在多台计算机的实例上顺利运行没有任何流命令的Pig脚本。
- 我所有的Pig工作都是在Pig MapReduce模式下进行的,而不是-x local模式。
- 我的Python脚本(stream1.py)开头有这一行 #!/usr/bin/env python
下面是我到目前为止尝试过的一些选项(以下所有命令都是在主节点的grunt shell中执行的,我是通过ssh/putty访问的):
这是我把Python文件放到主节点上以便使用的方法:
cp s3n://darin.emr-logs/stream1.py stream1.py
copyToLocal stream1.py /home/hadoop/stream1.py
chmod 755 stream1.py
这些是我尝试的各种流命令:
cooc = stream ct_pag_ph through `stream1.py`
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
cooc = stream ct_pag_ph through `python stream1.py`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python stream1.py ' failed with exit status: 2
DEFINE X `stream1.py`;
cooc = stream ct_bag_ph through X;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'stream1.py ' failed with exit status: 127
DEFINE X `stream1.py`;
cooc = stream ct_bag_ph through `python X`;
dump coco;
ERROR 2090: Received Error while processing the reduce plan: 'python X ' failed with exit status: 2
DEFINE X `stream1.py` SHIP('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
ERROR 2017: Internal error creating job configuration.
DEFINE X `stream1.py` SHIP('/stream1.p');
cooc = STREAM ct_bag_ph THROUGH X;
dump cooc;
DEFINE X `stream1.py` SHIP('stream1.py') CACHE('stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
ERROR 2017: Internal error creating job configuration.
define X 'python /home/hadoop/stream1.py' SHIP('/home/hadoop/stream1.py');
cooc = STREAM ct_bag_ph THROUGH X;
1 个回答
2
DEFINE X `stream1.py` SHIP('stream1.py');
根据你的前提条件,以及你当前本地目录中有stream1.py,这看起来是有效的。
确保这一点的方法是:
DEFINE X `python stream1.py` SHIP('/local/path/stream1.py');
SHIP的目标是将命令复制到所有任务的工作目录中。