difflib.SequenceMatcher在两个以上的序列上

2024-04-26 05:59:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我的最终结果是:我需要一个zip_longest()的变体,给定任意数量的序列,将它们并排生成,当它们不完全相同时,用None填充空白。在

处理文件时的并行方式是键入 vimdiff文件1,文件2,文件3。。。。在

例如,给定序列

a = ["foo", "bar", "baz", "asd"]
b = ["foo", "baz"]
c = ["foo", "bar"]

我需要一个生成这些元组的函数:

^{pr2}$

我很简单地用difflib.SequenceMatcher. 但是,它只适用于两个序列:

from difflib import SequenceMatcher

def zip_diff2(a, b, fillvalue=None):
    matches = SequenceMatcher(None, a, b).get_matching_blocks()
    for match, next_match in zip([None] + matches, matches + [None]):

        if match is None:
            # Process disjoined elements before the first match
            for i in range(0, next_match.a):
                yield a[i], fillvalue
            for i in range(0, next_match.b):
                yield fillvalue, b[i]
        else:
            for i in range(match.size):
                yield a[match.a + i], b[match.b + i]

            if next_match is None:
                a_end = len(a)
                b_end = len(b)
            else:
                a_end = next_match.a
                b_end = next_match.b

            for i in range(match.a + match.size, a_end):
                yield a[i], fillvalue
            for i in range(match.b + match.size, b_end):
                yield fillvalue, b[i]

如何让它在任意数量的序列上工作?在


Tags: 文件innoneforfoomatchrange序列
1条回答
网友
1楼 · 发布于 2024-04-26 05:59:20

为了达到你想要的效果,我认为有必要首先用给定序列中所有可能的值创建一个基序列。为此,我做了代码:

def build_base_sequence(*sequences):
    # Getting the biggest sequence size.
    max_count = 0
    for sequence in sequences:
        max_count = max(max_count, len(sequence))

    # Normalizing the sequences to have all the same size.
    new_sequences = []
    for sequence in sequences:
        new_sequence = sequence + [None] * max_count
        new_sequences.append(new_sequence[:max_count])

    # Building the base sequence:
    base_sequence = []
    for values in zip(*new_sequences):
        for value in values:
            if value is None or value in base_sequence:
                continue
            base_sequence.append(value)

    return base_sequence

你可以使用你的函数,多次调用它。我认为difflib.SequenceMatcher的用法太复杂了,所以我编写了自己的代码:

^{pr2}$

这有点像一个新手/傻瓜/天真的代码,但是,嘿,它做到了!在

>>> a = ["foo", "bar", "baz", "asd"]
>>> b = ["foo", "baz"]
>>> c = ["foo", "bar"]
>>> for values in zip_diff(a, b, c):
...     print values
...
('foo', 'foo', 'foo')
('bar', None, 'bar')
('baz', 'baz', None)
('asd', None, None)
>>> 

我希望这对你有所帮助。在

相关问题 更多 >