C++ 字符串差异(类似 Python 的 difflib)

4 投票
5 回答
2876 浏览
提问于 2025-04-11 09:33

我想比较两个字符串,看看它们是否只在某个数字部分有所不同;比如说,

varies_in_single_number_field('foo7bar', 'foo123bar')
# Returns True, because 7 != 123, and there's only one varying
# number region between the two strings.

在Python中,我可以用 difflib 来做到这一点:

import difflib, doctest

def varies_in_single_number_field(str1, str2):
    """
    A typical use case is as follows:
        >>> varies_in_single_number_field('foo7bar00', 'foo123bar00')
        True

    Numerical variation in two dimensions is no good:
        >>> varies_in_single_number_field('foo7bar00', 'foo123bar01')
        False

    Varying in a nonexistent field is okay:
        >>> varies_in_single_number_field('foobar00', 'foo123bar00')
        True

    Identical strings don't *vary* in any number field:
        >>> varies_in_single_number_field('foobar00', 'foobar00')
        False
    """
    in_differing_substring = False
    passed_differing_substring = False # There should be only one.
    differ = difflib.Differ()
    for letter_diff in differ.compare(str1, str2):
        letter = letter_diff[2:]
        if letter_diff.startswith(('-', '+')):
            if passed_differing_substring: # Already saw a varying field.
                return False
            in_differing_substring = True
            if not letter.isdigit(): return False # Non-digit diff character.
        elif in_differing_substring: # Diff character not found - end of diff.
            in_differing_substring = False
            passed_differing_substring = True
    return passed_differing_substring # No variation if no diff was passed.

if __name__ == '__main__': doctest.testmod()

但是我不知道在C++中有没有类似 difflib 的东西。欢迎其他方法的建议。:)

5 个回答

1

这可能有点过于复杂了,但你可以用boost来和python进行连接。最糟糕的情况是,difflib这个库是用纯python写的,而且代码量不大。把它从python移植到C语言应该是可行的……

1

你可以用一种灵活的方法来解决这个问题:你想要比较两个字符串 s 和 s',其中 s=abc,s'=ab'c,b 和 b' 应该是两个不同的数字(可以是空的)。那么,你可以这样做:

  1. 从左边开始,一个字符一个字符地比较这两个字符串,直到遇到不同的字符为止,然后停止。
  2. 同样地,从右边开始比较,直到遇到不同的字符,或者遇到左边的标记。
  3. 然后检查中间剩下的部分,看看它们是否都是数字。
2

这可能有效,至少通过了你的演示测试:
编辑:我对代码做了一些修改,以解决一些字符串索引的问题。我相信现在应该没问题了。

#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cctype>

bool starts_with(const std::string &s1, const std::string &s2) {
    return (s1.length() <= s2.length()) && (s2.substr(0, s1.length()) == s1);
}

bool ends_with(const std::string &s1, const std::string &s2) {
    return (s1.length() <= s2.length()) && (s2.substr(s2.length() - s1.length()) == s1);
}

bool is_numeric(const std::string &s) {
    for(std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
        if(!std::isdigit(*it)) {
                return false;
        }
    }
    return true;
}

bool varies_in_single_number_field(std::string s1, std::string s2) {

    size_t index1 = 0;
    size_t index2 = s1.length() - 1;

    if(s1 == s2) {
        return false;
    }

    if((s1.empty() && is_numeric(s2)) || (s2.empty() && is_numeric(s1))) {
        return true;
    }

    if(s1.length() < s2.length()) {
        s1.swap(s2);
    }

    while(index1 < s1.length() && starts_with(s1.substr(0, index1), s2)) { index1++; }
    while(ends_with(s1.substr(index2), s2)) { index2--; }

    return is_numeric(s1.substr(index1 - 1, (index2 + 1) - (index1 - 1)));

}

int main() {
    std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar00") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar01") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foo123bar00") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foobar00") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("7aaa", "aaa") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("aaa7", "aaa") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("aaa", "7aaa") << std::endl;
    std::cout << std::boolalpha << varies_in_single_number_field("aaa", "aaa7") << std::endl;
}

基本上,它是在寻找一个字符串,这个字符串分成三部分:字符串string2的开头是part1,结尾是part3,中间的part2只能是数字。

撰写回答