有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

regex Java word finder程序不会捕获字符串中的所有唯一项

我正在制作一个简单的程序来查找给定字符串中的所有单词,并将所有唯一的单词放入数组列表中。(与python中的list.sort()对列表的作用差不多)

然而,在我给定的测试输入中,程序跳过了一个单词。非常感谢你能深入了解为什么它没有抓住所有的字眼

这是我的代码:

public class wordFinder {
public static void main(String[] args) {
    String input = "This is a test This is a test This is a test This is a test This is another test This is not a test";
    ArrayList<String> wordList = new ArrayList<>();
    Pattern pattern = Pattern.compile("\\w+");
    Matcher match = pattern.matcher(input);
    while(match.find()) {
        wordList.add(match.group());
    }
    System.out.println(wordList);
    for (int i = 0; i < wordList.size(); i++){
        for(int q = i; q< wordList.size(); q++){
            if(wordList.get(i).equals(wordList.get(q))){
                wordList.remove(q);
            }
            else continue;
        }

    }
    System.out.println(wordList);
}

}

另外,我知道不需要正则表达式和模式/匹配器,因为我可以拆分字符串。我这样做是因为我想以后扩展我的程序来搜索不止一个特定的东西


共 (3) 个答案

  1. # 1 楼答案

    这是一个比添加然后删除更好的选项。还有,就像我说的,你用这个干什么?如果你用它做单词库,你可能想考虑其他的结构!

      public static void main(String[] args) {
            ArrayList<String> list = new ArrayList<String>();
            String input = "This is a test This is a test This is a test This is a test This is another test This is not a test";
            String [] tokens = input.split("\\s");
            for(int i = 0; i < tokens.length; ++i){
                if(!list.contains(tokens[i])){
                    list.add(tokens[i]);
                }
            }
            System.out.println(list);
        }
    
  2. # 2 楼答案

    请试试这个

    public static void main(String[] args) {
            String input = "This is a test This is a test test test This test This is a test This is a test This is another test This is not a test";
            ArrayList<String> wordList = new ArrayList<>();
            Pattern pattern = Pattern.compile("\\w+");
            Matcher match = pattern.matcher(input);
    
            while(match.find()) {
                if (wordList.size() == 0 || wordList.indexOf(match.group()) == -1){
                    wordList.add(match.group());
                }
            }
    
            System.out.println(wordList);
        }
    

    输出: [这是一个测试,另一个,不是]

  3. # 3 楼答案

    您只需使用Set不包含重复元素的集合)。这就是它的设计目的。您删除重复项的方法有缺陷。在调试器中单步执行,您将看到何时删除单词“另一个”(提示:wheni = q

    public static void main(String[] args) {
        String input = "This is a test This is a test This is a test This is a test This is another test This is not a test";
        Set<String> wordList = new HashSet<>();
        Pattern pattern = Pattern.compile("\\w+");
        Matcher match = pattern.matcher(input);
        while(match.find()) {
            wordList.add(match.group());
        }
        System.out.println(wordList);
    }