Unix排序产生错误的输出

2024-04-28 08:54:16 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试通过执行以下操作来测试hadoop流式处理作业的映射器和缩减器函数：

    cat data.txt | python mapper.py | sort | python reducer.py

但是映射器的排序输出不正确。你知道吗

he the  1
i       1
i dog   1
i like  1
i'm     1
i'm rob 1
i'm the 1
i the   1 ### this should be after "i like 1" ###
lazy    1

我让其他人在他们的机器上进行了测试，他们用相同的映射器函数和命令行执行得到了正确的输出。所以我的Unix系统似乎出了问题。你知道吗

如果这有帮助：

echo $TERM
> vt100

任何关于尝试什么或设置不同的建议都将不胜感激。谢谢

Tags： the 函数 py txt hadoop data 排序作业

1条回答

网友

1楼 · 发布于 2024-04-28 08:54:16

你有你的答案here，它是关于语言环境的。简而言之，你应该使用

cat data.txt | python mapper.py | LC_COLLATE=C sort