C++ 字符串差异(类似 Python 的 difflib)
我想比较两个字符串,看看它们是否只在某个数字部分有所不同;比如说,
varies_in_single_number_field('foo7bar', 'foo123bar')
# Returns True, because 7 != 123, and there's only one varying
# number region between the two strings.
在Python中,我可以用 difflib
来做到这一点:
import difflib, doctest
def varies_in_single_number_field(str1, str2):
"""
A typical use case is as follows:
>>> varies_in_single_number_field('foo7bar00', 'foo123bar00')
True
Numerical variation in two dimensions is no good:
>>> varies_in_single_number_field('foo7bar00', 'foo123bar01')
False
Varying in a nonexistent field is okay:
>>> varies_in_single_number_field('foobar00', 'foo123bar00')
True
Identical strings don't *vary* in any number field:
>>> varies_in_single_number_field('foobar00', 'foobar00')
False
"""
in_differing_substring = False
passed_differing_substring = False # There should be only one.
differ = difflib.Differ()
for letter_diff in differ.compare(str1, str2):
letter = letter_diff[2:]
if letter_diff.startswith(('-', '+')):
if passed_differing_substring: # Already saw a varying field.
return False
in_differing_substring = True
if not letter.isdigit(): return False # Non-digit diff character.
elif in_differing_substring: # Diff character not found - end of diff.
in_differing_substring = False
passed_differing_substring = True
return passed_differing_substring # No variation if no diff was passed.
if __name__ == '__main__': doctest.testmod()
但是我不知道在C++中有没有类似 difflib
的东西。欢迎其他方法的建议。:)
5 个回答
1
这可能有点过于复杂了,但你可以用boost来和python进行连接。最糟糕的情况是,difflib这个库是用纯python写的,而且代码量不大。把它从python移植到C语言应该是可行的……
1
你可以用一种灵活的方法来解决这个问题:你想要比较两个字符串 s 和 s',其中 s=abc,s'=ab'c,b 和 b' 应该是两个不同的数字(可以是空的)。那么,你可以这样做:
- 从左边开始,一个字符一个字符地比较这两个字符串,直到遇到不同的字符为止,然后停止。
- 同样地,从右边开始比较,直到遇到不同的字符,或者遇到左边的标记。
- 然后检查中间剩下的部分,看看它们是否都是数字。
2
这可能有效,至少通过了你的演示测试:
编辑:我对代码做了一些修改,以解决一些字符串索引的问题。我相信现在应该没问题了。
#include <iostream>
#include <string>
#include <vector>
#include <algorithm>
#include <cctype>
bool starts_with(const std::string &s1, const std::string &s2) {
return (s1.length() <= s2.length()) && (s2.substr(0, s1.length()) == s1);
}
bool ends_with(const std::string &s1, const std::string &s2) {
return (s1.length() <= s2.length()) && (s2.substr(s2.length() - s1.length()) == s1);
}
bool is_numeric(const std::string &s) {
for(std::string::const_iterator it = s.begin(); it != s.end(); ++it) {
if(!std::isdigit(*it)) {
return false;
}
}
return true;
}
bool varies_in_single_number_field(std::string s1, std::string s2) {
size_t index1 = 0;
size_t index2 = s1.length() - 1;
if(s1 == s2) {
return false;
}
if((s1.empty() && is_numeric(s2)) || (s2.empty() && is_numeric(s1))) {
return true;
}
if(s1.length() < s2.length()) {
s1.swap(s2);
}
while(index1 < s1.length() && starts_with(s1.substr(0, index1), s2)) { index1++; }
while(ends_with(s1.substr(index2), s2)) { index2--; }
return is_numeric(s1.substr(index1 - 1, (index2 + 1) - (index1 - 1)));
}
int main() {
std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar00") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("foo7bar00", "foo123bar01") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foo123bar00") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("foobar00", "foobar00") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("7aaa", "aaa") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("aaa7", "aaa") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("aaa", "7aaa") << std::endl;
std::cout << std::boolalpha << varies_in_single_number_field("aaa", "aaa7") << std::endl;
}
基本上,它是在寻找一个字符串,这个字符串分成三部分:字符串string2的开头是part1,结尾是part3,中间的part2只能是数字。