java如何通过Hadoop mapreduce WordCount对最常重复的单词列表进行排序？

2 周，2 日 Questions & Answers 1999

嗨，我是hadoop mapreduce的新手

你们谁能帮我修改下面发布的代码以显示所需的输出

我有一个给定的输入文件

输入：Hi my name is John.Im doing my engineering.My parents stay at California

我得到的输出是

Hi    1
my   3
name  1
is    1
is 1
John 1
doing  1
engineering 1
parents  1
stay  1
at  1
California   1

但是我想把输出按

 my   3
 Hi   1 
 etc.....

然后所有其他的都会显示出来。其概念是显示重复次数最多的单词，应首先进行排序和显示

我在一个节点上运行这个作业。而我是作为

        $ hadoop jar job.jar input output

我已经开始

        $ hadoop namenode -format
        $ hadoop namenode

        $ hadoop datanode
        sbin$ ./yarn-daemon.sh start resourcemanager 
        sbin$ ./yarn-daemon.sh start resourcemanager

我正在运行hadoop-2.0.0-cdh4。0.0

        package org.apache.hadoop.examples;

        import java.io.IOException;
        import java.util.StringTokenizer;
        import org.apache.commons.logging.Log;
        import org.apache.commons.logging.LogFactory;

        import org.apache.hadoop.conf.Configuration;
        import org.apache.hadoop.io.IntWritable;
        import org.rg.apache.hadoop.fs.Path;
        import oapache.hadoop.io.Text;
        import org.apache.hadoop.mapreduce.Job;
        import org.apache.hadoop.mapreduce.Mapper;
        import org.apache.hadoop.mapreduce.Reducer;
        import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
        import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
        import org.apache.hadoop.util.GenericOptionsParser;

        public class WordCount {
        private static final Log LOG = LogFactory.getLog(WordCount.class);

          public static class TokenizerMapper
               extends Mapper<Object, Text, Text, IntWritable>{

            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();

            public void map(Object key, Text value, Context context
                            ) throws IOException, InterruptedException {
              StringTokenizer itr = new StringTokenizer(value.toString());
              while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
              }
            }
          }

          public static class IntSumReducer
               extends Reducer<Text,IntWritable,Text,IntWritable> {
            private IntWritable result = new IntWritable();

            public void reduce(Text key, Iterable<IntWritable> values,
                               Context context
                               ) throws IOException, InterruptedException {
              int sum = 0;
              //printKeyAndValues(key, values);

              for (IntWritable val : values) {
                sum += val.get();
              LOG.info("val = " + val.get());
              }
              LOG.info("sum = " + sum + " key = " + key);
              result.set(sum);
              context.write(key, result);
              //System.err.println(String.format("[reduce] word: (%s), count: (%d)", key, result.get()));
            }


          // a little method to print debug output
            private void printKeyAndValues(Text key, Iterable<IntWritable> values)
            {
              StringBuilder sb = new StringBuilder();
              for (IntWritable val : values)
              {
                sb.append(val.get() + ", ");
              }
              System.err.println(String.format("[reduce] key: (%s), value: (%s)", key, sb.toString()));
            }
          }

          public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
            if (otherArgs.length != 2) {
              System.err.println("Usage: wordcount <in> <out>");
              System.exit(2);
            }
            Job job = new Job(conf, "word count");
            job.setJarByClass(WordCount.class);
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
            FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
        }

如果有人能解决这个问题，我会很高兴

Python中文网

有 Java 编程相关的问题?

java如何通过Hadoop mapreduce WordCount对最常重复的单词列表进行排序？

共 (1) 个答案

# 1 楼答案