大型文本文件中的java字符串匹配问题

1 年 Questions & Answers 193

我想实现一个从大文本文件匹配字符串的任务。 1.替换所有非字母数字字符 2.计算文本文件中特定术语的编号。例如，匹配术语“汤姆”。匹配不区分大小写。所以“汤姆”这个词应该算在内。但是明天的学期不算在内

code template one:
    try {
           in = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile));
        } catch (FileNotFoundException e1) {
           System.out.println("Not found the text file: "+inputFile);
         }
    Scanner scanner = null;
    try {
        while (( line = in.readLine())!=null){  
               String newline=line.replaceAll("[^a-zA-Z0-9\\s]", " ").toLowerCase();
               scanner = new Scanner(newline);
               while (scanner.hasNext()){
                       String term = scanner.next();
                   if (term.equalsIgnoreCase(args[1]))
                   countstr++;
               }
         }
     } catch (IOException e) {
    e.printStackTrace();
    }

code template two:
   try {
        in = new BufferedReader(new InputStreamReader(new FileInputStream(inputFile));
       } catch (FileNotFoundException e1) {
           System.out.println("Not found the text file: "+inputFile);
         }
   Scanner scanner = null;
   try {
        while (( line = in.readLine())!=null){  
               String newline=line.replaceAll("[^a-zA-Z0-9\\s]", " ").toLowerCase();
               String[] strArray=newline.split(" ");//split by blank space
                       for (int =0;i<strArray.length;i++)
                               if (strArray[i].equalsIgnoreCase(args[1]))
                                      countstr++;
               }
         }
     } catch (IOException e) {
    e.printStackTrace();
   }

通过运行这两个代码，我得到了不同的结果，扫描仪似乎得到了正确的结果。但是对于大文本文件，扫描仪的运行速度要比后者慢得多。任何能告诉我原因并给出更有效解决方案的人

String key = String.valueOf(".*?\\b" + "Tom".toLowerCase() + "\\b.*?"); Pattern p = Pattern.compile(key); word = word.toLowerCase().replaceAll("[^a-zA-Z0-9\\s]", ""); Matcher m = p.matcher(word); if (m.find()) { countstr++; }

String key = String.valueOf(".*?\\b" + args[0].toLowerCase() + "\\b.*?"); Pattern p = Pattern.compile(key); try (final BufferedReader br = Files.newBufferedReader(inputFile, StandardCharsets.UTF_8)) { for (String line; (line = br.readLine()) != null;) { // processing the line. line = line.toLowerCase().replaceAll("[^a-zA-Z0-9\\s]", ""); Matcher m = p.matcher(line); if (m.find()) { countstr++; } } }

Python中文网

有 Java 编程相关的问题?

大型文本文件中的java字符串匹配问题

共 (1) 个答案

# 1 楼答案