如何在python中比较两个排序的数字列表，其中每个对应的元素不必完全匹配？

3条回答

网友

1楼 · 编辑于 2024-05-15 07:56:29

感谢@MattMessersmith，我已经实现了我的最终解决方案，它满足了2个需求：

筛选两个列表中彼此太近的项

在两个彼此接近的列表中匹配项

list1 = [7, 22, 34, 49, 56, 62, 76, 82, 89, 149, 161, 182]
list2 = [7, 14, 49, 57, 66, 76, 135, 142, 161]
>>> result = match_approximate(list1, list2, 3)
>>> print result[0]
>>> print result[1]
[7, 49, 56, 76, 161]
[7, 49, 57, 76, 161]
>>> result = match_approximate(list1, list2, 1, True)
>>> print result[0]
>>> print result[1]
[22, 34, 62, 82, 89, 149, 182]
[14, 66, 135, 142]

代码如下：

    def match_approximate(a, b, approx, invert=False):
        a_ind, b_ind = 0, 0
        resulta, resultb = [], []
        while a_ind < len(a) and b_ind < len(b):
            aItem, bItem = a[a_ind], b[b_ind]
            if abs(aItem - bItem) <= approx:
                if not invert:
                    resulta.append(aItem)
                    resultb.append(bItem)
                a_ind += 1
                b_ind += 1
                continue
            if aItem < bItem:
                if invert:
                    resulta.append(aItem)
                a_ind += 1
            else:
                if invert:
                    resultb.append(bItem)
                b_ind += 1

        if invert:
            while a_ind != len(a):
                resulta.append(a[a_ind])
                a_ind += 1
            while b_ind != len(b):
                resulta.append(b[b_ind])
                b_ind += 1

        return [resulta, resultb]

网友

2楼 · 编辑于 2024-05-15 07:56:29

我不知道你为什么在这里排队。它似乎不是解决这个问题的正确数据结构。此外，Python有一个内置的队列数据结构（collections.deque，只使用popleft()而不是pop(0)）。你知道吗

一种更简单的方法（imo）是只维护指向每个数组的“指针”（或索引），从开始处开始。如果元素彼此在approx范围内，则添加它们，并递增两个指针。如果a的元素小于b的元素，则增加a的指针。否则，增加b的指针。继续，直到两个指针都用完（即指向列表的末尾）。它以O（N），线性时间运行。下面是上述算法的实现：

def match_approximate(a, b, approx=3):
    a_ind, b_ind = 0, 0
    result = []
    while a_ind < len(a) and b_ind < len(b):
        if abs(a[a_ind] - b[b_ind]) <= approx:
            result.append(b[b_ind])
        if a[a_ind] == b[b_ind]:
            b_ind += 1
            a_ind += 1
        elif a[a_ind] < b[b_ind]: a_ind += 1
        else: b_ind += 1

    def match_last_element(a, a_ind, last_elt_of_b, result):
        while a_ind != len(a):
            if abs(a[a_ind] - last_elt_of_b) <= approx:
                result.append(a[a_ind])
                a_ind += 1
            else:
                break

    if a_ind != len(a): match_last_element(a, a_ind, b[-1], result)
    else: match_last_element(b, b_ind, a[-1], result)

    return result

跑步

a = [7, 22, 34, 49, 56, 62, 76, 82, 89, 161, 163, 174]
b = [5, 6, 7, 14, 49, 57, 66, 76, 135, 142, 161]
print(match_approximate(a, b, 3))

将输出[5, 6, 7, 49, 57, 76, 161, 163]（这是我假设所期望的，但是请参见下面关于一些不清楚的边缘情况的内容）。如果您在当前impl上运行这个案例，您将得到一个IndexError。你知道吗

在某些边缘情况下，我们还不完全清楚该怎么办：

当你有一个近似匹配时，你会在结果中添加哪个元素？这个impl只是拉入b的元素（如示例所示）。
应如何处理重复项？在这个impl中，如果两个列表都包含一个dup，它将被添加两次。如果只有一个列表包含重复项，则只添加一次元素。关于dup的一个更复杂的例子是理解如果我们有[4,5]和[4,5]作为输入，那么我们的输出应该是[4,5]还是[4,4,5,5]（因为4和5都在approx之内，4也与5匹配）。
绑定的approx是包含的还是独占的（即<= approx或< approx）？

请随意调整上述impl，以处理您认为在这些情况下需要执行的任何操作。你知道吗

嗯。你知道吗

网友

3楼 · 编辑于 2024-05-15 07:56:29

从X中选取近似匹配：

import numpy as np

X = np.array([7,22,34,49,56,62,76,82,89,161,174])  #len:11
Y = np.array([7,14,49,57,66,76,135,142,161])       #len:9

dist = np.abs(Y[:, np.newaxis] - X)
#print(dist)
for i in range(len(Y)):
    for j in dist[i]:
        if -3<=j<=3: #approximation of 3
            idx = dist[i].tolist().index(j)
            print(X[idx])

输出：

从Y中选取近似匹配：

import numpy as np

X = np.array([7,22,34,49,56,62,76,82,89,161,174])  #len:11
Y = np.array([7,14,49,57,66,76,135,142,161])       #len:9

dist = np.abs(X[:, np.newaxis] - Y)
#print(dist)
for i in range(len(Y)+1):
    for j in dist[i]:
        if -3<=j<=3:
            #print(j)
            idx = dist[i].tolist().index(j)
            print(Y[idx])

输出：

相关问题更多 >

编程相关推荐

热门问题

热门文章