算法：从列表中选择点

1 投票

6 回答

922 浏览

提问于 2025-04-18 04:58

我在寻找合适的算法来解决我的问题时遇到了困难。
我有两个大的数字列表，列表A和列表B。列表A里的数字是排好序的实数，列表B里的数字也是排好序的实数。我想创建一个列表C，里面包含从列表B中选择的点。选择的方式是这样的：对于列表A中的每一个元素（数字），我需要在列表B中找到离它最近的数字（也就是绝对值差最小的）。我会给一些例子，因为这样解释比较难。
例子1：

A= [20.0, 19.6, 19.2]
B= [22.454, 22.224, 21, 20.1, 19.76, 19.72, 19]

C= [20.1, 19.72, 19]

条件：
列表C不能有重复的点。例如，如果列表A中的两个元素对应列表B中的同一个点（因为这个点是它们各自最近的点），那么列表A中的一个点必须选择列表B中的第二近的点。
例子2：

A= [20.0, 19.6, 19.2]
B= [22.454, 22.224, 21, 20.3, 19.9, 19, 18]

C= [20.3, 19.9, 19]
# I it must not choose  C= [19.9, 19, 18] because its not an optimum solution

注意：
列表B的元素数量总是比列表A多，至少多4倍。所以如果列表A有10个元素，列表B至少会有40个元素。
所以让我快速再解释一下，列表C必须包含从列表B中选择的、排好序且不重复的元素，这些选择的点必须尽可能接近列表A中的每个元素。
我希望我的解释和英语足够清楚 :)
请帮我找一个好的方法在Python 2.7中实现这个。谢谢大家！

数据结构算法列表处理重复元素数字排序选择算法最近邻最小绝对差

6 个回答

这是我想到的伪代码：

b := 0
For each element a in array A:
    while b < length(B) and a < B[b]:
        b++
    if b == length(B):
        Push B[-1] into C
    else if b == 0:
        Push B[0] into C
    else:
        if abs(B[b] - a) < abs(B[b - 1] - a):
            Push B[b]
        else:
            Push B[b - 1]
            b++

回答于 2025-04-18 由 Python大师

分享举报

这可以看作是一个线性规划的分配问题（或者说是最大二分匹配问题），也就是说你需要把一些资源分配到目的地，并且尽量让成本最低。

List A is your list of sources.  
List B is your list of destinations.  
Since list B is larger, you have to create dummy sources in List A to make the problem balanced.
The cost of assigning element i from list A to element j of List B = abs(A[i] - B[j])

现在可以使用匈牙利算法来解决这个问题，复杂度大约是 O(n^3)。

回答于 2025-04-18 由 Python大师

分享举报

使用二分查找

def binary(arr, val):
    assert len(arr) > 0
    if len(arr) == 1:
        return 0
    l2 = len(arr) // 2
    if abs(arr[l2-1] - val) > abs(arr[l2] - val):
        return l2 + binary(arr[l2:], val)
    else:
        return binary(arr[0:l2], val)

def closest_points(A, B):
    C = []
    for val in A:
        idx = binary(B, val)
        C.append(B[idx])
        B.pop(idx)
    return C

A = [20.0, 19.6, 19.2]
B = [22.454, 22.224, 21, 20.1, 19.76, 19.72, 19]
A.reverse()
B.reverse()
C = closest_points(A, B)
C.reverse()
assert C == [20.1, 19.72, 19]

A = [20.0, 19.6, 19.2]
B = [22.454, 22.224, 21, 20.3, 19.9, 19, 18]
A.reverse()
B.reverse()
C = closest_points(A, B)
C.reverse()
assert C == [20.3, 19.9, 19]

复杂度是 len(A) * log(len(B))

实际上，它确实可以在你的例子中工作，但并不符合你的要求（就像其他基于二分查找的解决方案一样 :( ）正确的方法是使用迭代最近点算法，而不是暴露复杂度。

回答于 2025-04-18 由 Python大师

分享举报

使用 bisect 模块：

import bisect

def solve(A, B):

    A.reverse()
    B.reverse()
    C, indices = [], []

    for x in A:

        i = bisect.bisect_left(B, x)
        if i == 0:
            C.append(B[0])
            indices.append(0)
        elif i == len(B):
            C.append(B[-1])
            indices.append(len(B)-1)
        else:
            minn = min((i-1, i, i+1), key=lambda y:abs(x-B[y]))
            C.append(B[minn])
            indices.append(minn)
    seen = set()
    for i, x in enumerate(C):
        if x in seen:
            C[i] = B[indices[i]+1]
            seen.add(B[indices[i]+1])
        else:
            seen.add(x)

    C.reverse()
    return C

A= [20.0, 19.6, 19.2]
B= [22.454, 22.224, 21, 20.1, 19.76, 19.72, 19]
assert solve(A, B) == [20.1, 19.72, 19]

A= [20.0, 19.6, 19.2]
B= [22.454, 22.224, 21, 20.3, 19.9, 19, 18]
assert solve(A, B) == [20.3, 19.9, 19]

A = [20,17,14,11]
B = [22,21,20,19,18,17,16]
assert solve(A, B) == [20, 18, 17, 16]

回答于 2025-04-18 由 Python大师

分享举报

这里有一个简单的动态规划解决方案。我们把状态定义为dp[i][j]，它表示在我们有listB从0到j的值时，listA中从0到i的元素的最佳匹配值。

对于每个元素，我们有两个选择：要么把listA[i]和listB[j]匹配起来，要么不匹配。所以，dp的关系式是：

dp[i][j] = min(dp[i-1][j-1] + abs(listA[i]-listB[j]), dp[i][j-1]);

这里有实现这个动态规划解决方案的Java代码，它采用自上而下的方法，并使用了记忆化技术，代码可以在这里找到。

public class Main {     
    static double[] a = {20.0, 19.6, 19.2};
    static double[] b = {22.454, 22.224, 21, 20.3, 19.9, 19, 18};  // declare the 2 arrays
    static boolean[][] takes; // tells about the dp state.
    static double INF = 1000.0;
    static double[][] dp;         // gives the answer for the above defined dp states

    public static void main(String[] args){
        dp = new double[a.length][b.length];
        takes = new boolean[a.length][b.length];

        for(int i=0; i<dp.length; i++)
            Arrays.fill(dp[i], Double.MAX_VALUE);
        System.out.println(maxMatching(a.length-1, b.length-1));
        printSolution(a.length-1, b.length-1);
    }

    static void printSolution(int i, int j){
        if(i==-1 || j==-1)return;
        if(takes[i][j]){
            System.out.println(a[i] + " " + b[j]);
            printSolution(i-1, j-1);
        }else{
            printSolution(i, j-1);
        }
    }

    static double maxMatching(int i, int j){
        //if there are no numbers left to be matched
        if(i==-1)                        return 0;
        //If there are numbers to match but no number to which we can match
        if(j==-1)                        return INF;
        //If we have already solved this problem 
        if(dp[i][j] != Double.MAX_VALUE) return dp[i][j];

        double val1 = maxMatching(i, j-1); // ith number of listAis not matched with jth number of listB
        double val2 = Math.abs(a[i] - b[j]) + maxMatching(i-1, j-1);
        // When ith number is matched with the jth number

        // take the minimum of the above two.
        if(Double.compare(val1, val2)>0){
            dp[i][j] = val2;
            takes[i][j] = true;
        }else{
            dp[i][j] = val1;
        }
        return dp[i][j];
    }
}

运行时间分析：这段代码的运行时间是O(NM)，其中N是list1的长度，M是list2的长度。这是因为总共有O(NM)个子问题，而每个子问题的处理时间是O(1)。

补充说明：这个解决方案与其他两个O(nlogn)的解决方案相比在所有测试用例上都能正常工作，并且总能找到正确的答案。我认为甚至不存在一个能保证O(nlogn)的解决方案，能够正确解决所有版本的问题。

回答于 2025-04-18 由 Python大师

分享举报

算法：从列表中选择点

6 个回答

撰写回答