Python:对称差分排序Lis

2024-04-20 06:10:30 发布

您现在位置:Python中文网/ 问答频道 /正文

在python中,有没有一种好的方法来获取两个排序列表的对称差并返回一个排序列表。我当前的版本似乎是一个糟糕的解决方法(转换到set,找到对称差分,转换回list,然后求助)

使用Numpy的解决方案很好,正在排序的数据类型是int。在

sorted_symdiff(list1,list2):
""" Each list is already sorted, this seems inefficient """
    s1,s2 = set(list1),set(list2)
    diff = list(s1.symmetric_difference(s2))
    diff.sort()
    return diff

Tags: 方法版本numpy列表排序diff差分解决方案
2条回答

是的,有办法。您必须利用这两个序列是排序的这一事实。您需要遍历这两个元素,同时逐个比较元素,并在沿着每个序列前进时构造对称差分。在

如果您熟悉big O符号,以下代码的复杂性是O(m+n),其中m = len(seq1)和{}

算法的复杂性是O(log(m+n)*(m+n)),因为您需要对结果集进行排序。在

Caveat:

This answer is mostly an exercise to demonstrate how to take advantage of a sorted input.

In spite of a better complexity, for most inputs, its execution times are slower than the original poster's code that uses python builtin set methods. In python, sets are implemented in c code under the hood. Pure python will have a hard time beating that. Very large input would be necessary to see any advantage (if any is at all visible). This algorithm is the most efficient, but that does not mean that it is faster - nor does it mean that you should use it: set builtin methods are optimized and battle tested c code; they make for code that is simpler to write, read, understand, debug, and maintain.

代码:

def get_symmetric_difference(seq1, seq2):
    """
    computes the symmetric difference of unique elements of seq1 & seq2 
    as a new sorted list, without mutating the parameters.

    seq1: a sorted sequence of int
    seq2: a sorted sequence of int

    return: a new sorted list containing the symmetric difference 
            of unique elements of seq1 & seq2
    """

    if not seq1:
        symmetric_difference = seq2[:]
        return symmetric_difference
    if not seq2:
        symmetric_difference = seq1[:]
        return symmetric_difference

    symmetric_difference = []

    idx = 0
    jdx = 0  
    last_insert = None
    last_seen = None

    while idx < len(seq1) and jdx < len(seq2):
        s1 = seq1[idx]
        s2 = seq2[jdx]
        if s1 == s2:
            idx += 1
            jdx += 1
            last_seen = s1
        elif s1 < s2:
            if last_insert != s1 and last_seen != s1:
                symmetric_difference.append(s1)
                last_insert = s1
            idx += 1
        elif s2 < s1:
            if last_insert != s2 and last_seen != s2:
                symmetric_difference.append(s2)
                last_insert = s2
            jdx += 1

    if len(seq1[idx:]) > len(seq2[jdx:]):
        for elt in seq1[idx:]:
            if last_insert != elt and last_seen != elt:
                symmetric_difference.append(elt)
                last_insert = elt
                last_seen = elt
    else:
        for elt in seq2[jdx:]:
            if last_insert != elt and last_seen != elt:
                symmetric_difference.append(elt)
                last_insert = elt
                last_seen = elt

    return symmetric_difference

测试:

^{pr2}$

输出:

***all tests pass***

永远不要相信set会被排序。当您希望返回已排序的list时,总是在将set转换为list对象之后进行排序。我不确定我在下面的解释中观察到的行为。

转换回列表后不需要排序,因为列表已经排序。删除额外的排序将使其效率更高。在

如果list1list2被保证是正的int对象的排序列表,那么得到的symmetric_differenceset似乎返回了python3.5中的sorted。如果list1list2包含任何负数int或{},则结果需要再次排序。在

def sorted_symdiff(list1,list2):
    """ Each list is already sorted, this seems inefficient """
    s1,s2 = set(list1),set(list2)
    diff = list(s1.symmetric_difference(s2))
    return diff

相关问题 更多 >