MapReduce编程过滤大输入fi

import sys #I'm assuming that I have read in status and portions. #In my understanding, the number of output files depend on the number of reducers. But what I want is to discard all rows that has at least 1 upper case letter in status bar. #The file output should be a single file with rows that have all lower case letters in status bar.

1条回答

网友

1楼 · 发布于 2024-04-20 00:27:01

你完全可以不用减速器

在你的地图上是这样的：

import sys
import re

for line in sys.stdin:
    line = line.strip()
    portions = re.split(r'\t+', line)
    status = portions[-1]
    if status.islower():
        whatever_you_want_to_write = status + ',' + portions #whatever
        sys.stdout.write(whatever_you_want_to_write)

有关读/写HDFS的详细信息，请参阅Hadoop Streaming的文档。在

像这样的东西，例如：

^{pr2}$

注意如何指定-jobconf mapred.reduce.tasks=0来告诉Hadoop您不需要reduce步骤。在

编程相关推荐

库伯内特斯的爪哇普罗米修斯jmx_出口商
在Java中使用通配符（仅限最后一个索引）验证ip地址的正则表达式
对象类的java equals（）方法
java spring boot（webflux）rest控制器获取远程IP地址
crc Java在同一字符串上返回不同的CRC32结果
包含数字和字母的java排序数组列表
java更改一个类变量的值，然后在另一个类中调用它
java Do While循环终止，不执行scanner函数
作为cron作业运行的linux Java控制台应用程序存在相对路径问题
安卓如何使用HttpUrlConnect使用java查询Github graphql API

相关问题更多 >

编程相关推荐

热门问题

热门文章