有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

Java:字符串的替代品。包含可以返回相似性的

我有三根弦

String a = Hello, how are you doing?
String b = Can I as you something?
String c = Hello, how are you doing? Can I ask you something?

我的目标是评估字符串c是否是字符串a和b的合并。 请注意,字符串b中有一个拼写错误,其中“as”应该是“ask”

当前逻辑为(pesudo代码):

if 
  String c contains String a AND String b
then 
  merge = true

我遇到的问题是,如果在合并过程中字符串c发生轻微变化,字符串。contains()不再有效,因为它在检查字符串b时返回false

有没有可能/想法使用另一个有效的例子

我试过使用字符串相似性(Jaccard等),但它们不起作用,因为a、b和c的大小可能会有所不同,所以很容易/可能获得正确的相似性百分比


共 (2) 个答案

  1. # 1 楼答案

    没有任何内置函数(我发现)可以做到这一点,但我提出了一些东西,希望能满足您的需要。你显然可以改变这一点(我试着让它尽可能干净)

    第一步:我们需要一个函数,它接收两个字符串并返回两个字符串中的差异数。我想出了一个非常简单的函数:

    public static int getNumberDifferences(String a, String b)
    {
        int maxLength = Math.max(a.length(), b.length());
        int minLength = Math.min(a.length(), b.length());
        int result = maxLength - minLength;//the difference in length between the two
    
        for(int i = 0; i < minLength; i++)
        {
            if(a.charAt(i) != b.charAt(i)) //If the characters are different
                result++; //Add one to the result
        }
    
        return  result;
    }
    

    简而言之,我们遍历字符串,每次遇到差异时,在差异数上加一。(请注意,在开始时,我计算两个字符串的长度差,因此这也计算大小差)

    第2步:我们需要另一个函数,它接收数组中的每个单词,并返回它遇到的所有差异。我想出了另一个超级简单的函数:

        public static int getNumberDifferences(String[] a, String[] b)
    {
        int result = 0;
    
        for(int i = 0; i < Math.min(a.length, b.length); i++)
        {
            result += getNumberDifferences(a[i], b[i]);
        }
    
        return result;
    }
    

    在这个函数中,我们只需添加字符串中每个单词之间的所有差异

    最后,我们展示:

        public static void main(String[] args)
    {
        String a = "Hello, how are you doing?" ;
        String b = "Can I ask you something?";
        String c = "Hello, how are you doing? Can I ask you something?";
    
        int differences = getNumberDifferences(
                (a + " " + b) //Join the two strings with a space in the middle
                        .split(" "), //Split them to take every word
                c.split(" ")); //Split c as well
    
        System.out.println(differences);
    }
    

    最后的代码是:

    public class Main {
    
    public static void main(String[] args)
    {
        String a = "Hello, how are you doing?" ;
        String b = "Can I ask you something?";
        String c = "Hello, how are you doing? Can I ask you something?";
    
        int differences = getNumberDifferences(
                (a + " " + b) //Join the two strings with a space in the middle
                        .split(" "), //Split them to take every word
                c.split(" ")); //Split c as well
    
        System.out.println(differences);
    }
    
    public static int getNumberDifferences(String[] a, String[] b)
    {
        int result = 0;
    
        for(int i = 0; i < Math.min(a.length, b.length); i++)
        {
            result += getNumberDifferences(a[i], b[i]);
        }
    
        return result;
    }
    
    public static int getNumberDifferences(String a, String b)
    {
        int maxLength = Math.max(a.length(), b.length());
        int minLength = Math.min(a.length(), b.length());
        int result = maxLength - minLength; //the difference in length between the two
    
        for(int i = 0; i < minLength; i++)
        {
            if(a.charAt(i) != b.charAt(i)) //If the characters are different
                result++; //Add one to the result
        }
    
        return  result;
    }
    

    }

    请让我知道这是否有帮助:)

  2. # 2 楼答案

    如何正确标记注释,必须与Levenshtein distance进行比较

    您希望使用相似性百分比来比较2个字符串,所以我们可以将这个百分比关联为字符串之间的关系距离和引用字符串的长度。所以,如果我们需要100%的相似性,我们的字符串必须是ab完全相等的,字符串之间的距离为0。相反:如果我们要求100%的相似性,我们的字符串必须是绝对不同的,我们的距离几乎和参考字符串的长度一样(或更多)

    我把相似性百分比命名为allowedDiscrepancy,因为它的信息量更大。所以,我的代码有distance方法来计算参考字符串和另一个字符串之间的距离,还有compareWithDiscrepancy方法来计算相关性。看看这个,它能用

    public class StringUtils {
        public static void main(String[] args) {
            final String a = "Hello, how are you doing?";
            final String b = "Can I as you something?";
            final String c = "Hello, how are you doing? Can I ass you something?";
    
            // allowedDiscrepancy = 1.0 (100%) - strings might be absolutely different
            //So, we have 2 strings with little difference, so it must be return "true"
            assertTrue(compareWithDiscrepancy(c, String.format("%s %s", a, b), 1.0));
            // allowedDiscrepancy = 0.0 (0%) - strings must be absolutely equals
            //So, we have 2 strings with little difference, but more than 0, so it must be return "false"
            assertFalse(compareWithDiscrepancy(c, String.format("%s %s", a, b), 0.0));
    
            final String sameA = "Hello.";
            final String sameB = "How are you?";
            final String sameC = String.format("%s %s", sameA, sameB);
    
            // allowedDiscrepancy = 1.0 (100%) - strings might be absolutely different
            //So, we have 2 strings absolutely equals, so it must be return "true"
            assertTrue(compareWithDiscrepancy(sameA, String.format("%s %s", sameA, sameB), 1));
            // allowedDiscrepancy = 0.0 (0%) - strings must be absolutely equals
            //So, we have 2 strings absolutely equals, so it must be return "true" too
            assertTrue(compareWithDiscrepancy(sameC, String.format("%s %s", sameA, sameB), 0));
    
            final String differentA = "Part 1.";
            final String differentB = "Part 2.";
            final String differentC = "Absolutely different string";
    
            // allowedDiscrepancy = 1.0 (100%) - strings might be absolutely different
            //So, we have 2 absolutely different strings, so it must be return "true"
            assertTrue(compareWithDiscrepancy(differentC, String.format("%s %s", differentA, differentB), 1));
            // allowedDiscrepancy = 0.0 (0%) - strings must be absolutely equals
            //So, we have 2 absolutely different strings, so it must be return "false" too
            assertFalse(compareWithDiscrepancy(differentC, String.format("%s %s", differentA, differentB), 0));
    
            System.out.println("Done!");
        }
    
        public static boolean compareWithDiscrepancy(final String referenceString, final String testedString, double allowedDiscrepancy) {
            if (allowedDiscrepancy < 0) allowedDiscrepancy = 0;
            if (allowedDiscrepancy > 1) allowedDiscrepancy = 1;
    
            int distance = distance(referenceString, testedString);
            double realDiscrepancy = distance * 1.0 / referenceString.length();
            if (realDiscrepancy > 1) realDiscrepancy = 1;
            return allowedDiscrepancy >= realDiscrepancy;
        }
    
        static int distance(String x, String y) {
            int[][] dp = new int[x.length() + 1][y.length() + 1];
    
            for (int i = 0; i <= x.length(); i++) {
                for (int j = 0; j <= y.length(); j++) {
                    if (i == 0) {
                        dp[i][j] = j;
                    } else if (j == 0) {
                        dp[i][j] = i;
                    } else {
                        dp[i][j] = min(dp[i - 1][j - 1]
                                + cost(x.charAt(i - 1), y.charAt(j - 1)),
                            dp[i - 1][j] + 1,
                            dp[i][j - 1] + 1);
                    }
                }
            }
    
            return dp[x.length()][y.length()];
        }
    
        public static int cost(char a, char b) {
            return a == b ? 0 : 1;
        }
    
        public static int min(int... numbers) {
            return Arrays.stream(numbers)
                .min().orElse(Integer.MAX_VALUE);
        }
    }