寻找更具Python风格的列表比较解决方案

5 投票

6 回答

543 浏览

提问于 2025-04-16 21:57

好的，我有两个列表：

x = [1, 2, 3, 4]
y = [1, 1, 2, 5, 6]

我比较这两个列表，得到了以下输出：

x = [3, 4]
y = [1, 5, 6]

基本的想法是逐个检查这两个列表，看看它们有没有相同的元素。如果有相同的元素，就把其中一个移除，但只移除一个，不是全部。如果没有相同的元素，就保持不变。如果两个列表完全相同，最后会变成 x = [], y = []。

这是我写的一个比较粗糙的解决方案，我希望其他人能推荐一个更好或者更符合 Python 风格的方法。用三个循环似乎有点多...

    done = True

    while not done:
        done = False
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
        print xlist, ylist

感谢你们总是花时间来阅读这个问题。亲亲！

集合操作数据结构编程风格算法优化循环效率列表比较元素移除

6 个回答

如果你不在乎顺序，可以用 collections.Counter 一行代码搞定：

>>> Counter(x)-Counter(y)
Counter({3: 1, 4: 1})

>>> Counter(y)-Counter(x)
Counter({1: 1, 5: 1, 6: 1})

如果你在乎顺序，那你可以通过遍历你的列表，从上面的字典中提取元素：

def prune(seq, toPrune):
    """Prunes elements from front of seq in O(N) time"""
    remainder = Counter(seq)-Counter(toPrune)
    R = []
    for x in reversed(seq):
        if remainder.get(x):
            remainder[x] -= 1
            R.insert(0,x)
    return R

示例：

>>> prune(x,y)
[3, 4]
>>> prune(y,x)
[1, 1, 5, 6]

回答于 2025-04-16 由 Python大师

分享举报

为了补充Gareth的回答：

>>> a = Counter([1, 2, 3, 4])
>>> b = Counter([1, 1, 2, 5, 6])
>>> (a - b).elements()
[3, 4]
>>> (b - a).elements()
[1, 5, 6]

基准测试代码：

from collections import Counter
from collections import defaultdict
import random

# short lists
#a = [1, 2, 3, 4, 7, 8, 9]
#b = [1, 1, 2, 5, 6, 8, 8, 10]

# long lists
a = []
b = []

for i in range(0, 1000):
    q = random.choice((1, 2, 3, 4))
    if q == 1:
        a.append(i)
    elif q == 2:
        b.append(i)
    elif q == 3:
        a.append(i)
        b.append(i)
    else:
        a.append(i)
        b.append(i)
        b.append(i)

# Modifies the lists in-place! Naughty! And it doesn't actually work, to boot.
def original(xlist, ylist):
    done = False

    while not done:
        done = True
        for x in xlist:
            for y in ylist:
                if x == y:
                    xlist.remove(x)
                    ylist.remove(y)
                    done = False
    return xlist, ylist # not strictly necessary, see above


def counter(xlist, ylist):
    x = Counter(xlist)
    y = Counter(ylist)
    return ((x-y).elements(), (y-x).elements())


def nasty(xlist, ylist):
    x = sum(([i]*(xlist.count(i)-ylist.count(i)) for i in set(xlist)),[])
    y = sum(([i]*(ylist.count(i)-xlist.count(i)) for i in set(ylist)),[])

    return x, y


def gnibbler(xlist, ylist):
    d = defaultdict(int)
    for i in xlist: d[i] += 1
    for i in ylist: d[i] -= 1
    return [k for k,v in d.items() for i in range(v)], [k for k,v in d.items() for i in range(-v)]

# substitute algorithm to test in the call
for x in range(0, 100000):
    original(list(a), list(b))

运行不够严格的基准测试[tm]（短列表是原始的，长列表是随机生成的，大约有1000个条目，里面有匹配和重复的情况，时间以原始算法的倍数表示）：

    100K iterations, short lists    1K iterations, long lists
Original     1.0                           1.0
Counter      9.3                           0.06
Nasty        2.9                           1.4
Gnibbler     2.4                           0.02

注意1：在小列表的情况下，创建Counter对象的时间似乎比实际算法的时间还要长。

注意2：在大约35个条目的列表长度下，原始算法和gnibbler的表现是一样的，超过这个长度后，gnibbler（和Counter）会更快。

回答于 2025-04-16 由 Python大师

分享举报

你可能在寻找的数据结构是多重集合（也叫“袋”），如果是这样的话，在Python中实现它的一个好方法是使用collections.Counter。

>>> from collections import Counter
>>> x = Counter([1, 2, 3, 4])
>>> y = Counter([1, 1, 2, 5, 6])
>>> x - y
Counter({3: 1, 4: 1})
>>> y - x
Counter({1: 1, 5: 1, 6: 1})

如果你想把Counter对象转换回包含重复元素的列表，可以使用elements这个方法：

>>> list((x - y).elements())
[3, 4]
>>> list((y - x).elements())
[1, 5, 6]

回答于 2025-04-16 由 Python大师

分享举报

寻找更具Python风格的列表比较解决方案

6 个回答

撰写回答