如何找到两个字符串的并集并维护ord

2024-04-26 07:05:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两条线,我想找一条它们的联合体。在这样做的同时,我想维持秩序。我这样做的目的是,我尝试几种方法来OCR图像并得到不同的结果。我想把所有不同的结果合并成一个结果,这将有最多的内容。在

这至少是我想要的:

#example1
string1 = "This is a test trees are green roses are red"
string2 = "This iS a TEST trees 12.48.1952 anthony gonzalez"
finalstring = "this is a test trees are green roses are red 12.48.1952 anthony gonzalez" 

#example2
string2 = "This is a test trees are green roses are red"
string1 = "This iS a TEST trees 12.48.1952 anthony gonzalez"
finalstring = "this is a test trees are green roses are red 12.48.1952 anthony gonzalez"

#example3
string1 = "telephone conversation in some place big image on screen"
String2 = "roses are red telephone conversation in some place big image on screen"
finalstring = "roses are red telephone conversation in some place big image on screen"
#or the following - both are fine in this scenario.
finalstring = "telephone conversation in some place big image on screen roses are red "

这就是我尝试过的:

^{pr2}$

Tags: intestissomegreenredthistrees
3条回答
" ".join(x if i >= len(string2.split()) or x == string2.lower().split()[i] else " ".join((x, string2.split()[i])) for i, x in enumerate(string1.lower().split()))

您可以使用生成器理解和join来完成您想要的。这会将i设置为string1中某个单词的索引,x则设置为该单词。然后检查该单词是否在string2中,如果不在,则将i处的string2中的单词添加到x中,以将这两个单词放入最后的字符串中。在

不要为此使用集合。您一定已经注意到,由于set()保持唯一的对象,因此只有一个获得最终结果。在

string1 = "This is a test trees are green roses are red"
string2 = "This iS a TEST trees 12.48.1952 anthony gonzalez"

str_lst = string1.split()

for s, t in zip(string1.split(), string2.split()):
    if s.lower() == t.lower():
        continue
    else:
        str_lst.append(t)

string = " ".join(s.lower() for s in str_lst)
#this is a test trees are green roses are red 12.48.1952 anthony gonzalez

您可以使用^{}进行此操作:

import difflib
def merge (l, r):
    m = difflib.SequenceMatcher(None, l, r)
    for o, i1, i2, j1, j2 in m.get_opcodes():
        if o == 'equal':
            yield l[i1:i2]
        elif o == 'delete':
            yield l[i1:i2]
        elif o == 'insert':
            yield r[j1:j2]
        elif o == 'replace':
            yield l[i1:i2]
            yield r[j1:j2]

这样使用:

^{pr2}$

如果要在字符级别执行合并,只需修改调用即可直接对字符串进行操作(而不是单词列表):

>>> merged = merge(string1.lower(), string2.lower())
>>> ''.join(merged)
'this is a test trees 12.48.1952 arenthony gronzaleen roses are redz'

此解决方案正确地维护了字符串各个部分的顺序。因此,如果两个字符串都以公共部分结束,但在结束之前有一个不同的段,那么这两个不同的段仍将出现在结果的公共结束之前。例如,合并A B DA C D将得到A B C D。在

因此,您可以通过简单地删除结果字符串的部分,以正确的顺序找到每个原始字符串。如果从该示例结果中删除C,则会获得第一个字符串;如果删除B,则会获得第二个字符串。这也适用于更复杂的合并。在

相关问题 更多 >