从总值最高的2个数组中选择N个数中的k个

2024-04-19 16:22:47 发布

您现在位置:Python中文网/ 问答频道 /正文

ABC为三个数组,每个数组包含N个数:

 A = a[0], a[1], a[2], ..., a[N-1] 
 B = b[0], b[1], b[2], ..., b[N-1]
 C = c[0], c[1], c[3], ..., c[N-1] 

我想从A中选择最好的k < N元素,从B中选择最好的k < N元素,以便使它们的值的总和最大化。有趣的变化是:如果元素i是从AB(其中i{0, ..., N-1}中是索引)中选择的,那么它们将贡献c[i]而不是贡献a[i] + b[i]的这些元素

乍一看,我觉得这似乎很简单,但我想得越多,它就参与得越多

我最终在寻找一个Python实现,但在这个阶段,我只是想弄清楚什么是一个高效的算法


示例

为了澄清,算法的输入是3N x 1数组ABC,以及k的整数值。预期的输出是两个k x 1索引列表,定义了AB(和C)元素的值最大化组合

例如,假设k = 2N = 4并让

 A = a[0], a[1], a[2], a[3] = 3, 1, 1, 0   
 B = b[0], b[1], b[2], b[3] = 1, 3, 0, 1  
 C = c[0], c[1], c[2], c[3] = 4, 4, 3, 2

即使在这个简单的例子中,也有许多可能的组合。例如,如果元素i = 0, 2是从A中选择的,元素j = 1, 3是从B中选择的,那么总值将是a[0] + a[2] + b[1] + b[3] = 8

另一方面,如果元素i = 0, 1j = 0, 1将从AB中选择,则特殊的扭曲适用:不是产生a[0] + a[1] + b[0] + b[1],而是由c[0] + c[1] = 8给出总值

在本例中,使总值最大化的元素组合由来自Ai = 0, 2和来自B的元素j = 1, 2给出。这将产生a[0] + b[1] + c[2] = 9的总值,可以验证该值比任何其他组合都高


答案比较

下面是提交的3个解决方案的快速比较。首先,我检查了所有这些,它们都给出了预期的结果。作为旁注,它们都不要求C的元素弱于AB中相应元素的总和,因此我在绩效评估中放弃了这一假设

以下是我运行的内容:

import numpy as np
from utils import tic, toc  # simple wrapper to time.perf_counter()

k, N = 10, 1000

A = list(np.random.random_sample([N]))
B = list(np.random.random_sample([N]))
C = list(np.random.random_sample([N]))

tic()
print(optimal_choices(k, A, B, C))  # solution by btilly
toc()

tic()
print(maxPicks(A.copy(), B.copy(), C.copy(), k))  # solution by Eric T-M
toc()

tic()
print(maxSum(A, B, C, k))  # solution by Alain T.
toc()

我测试了kN的各种组合。似乎只要k很小,@btilly的算法就可以在N中很好地伸缩@Alain-T.的算法正好相反,当k相对于N较大时,表现良好。综上所述,@Eric-T-M的算法做得最好,在kN中都能很好地伸缩

小问题:k=10和N=500

  • btilly算法:0.49s
  • Eric T-M算法:0.00s
  • 阿兰T.算法:0.52s

小k、大N:k=10和N=1000

  • btilly算法:0.89s
  • Eric T-M算法:0.00s
  • 阿兰T.的算法:1.99s

大k、小N:k=80和N=100

  • btilly算法:1.54s
  • Eric T-M算法:0.00s
  • 阿兰T.的算法:0.09s

中等问题:k=50和N=1000

  • btilly算法:13.01ss
  • Eric T-M算法:0.00s
  • 阿兰T.的算法:8.55s

大问题1:k=10和N=1\u 000

  • Eric T-M算法:1.03s

大问题2:k=1_000和N=100_000

  • Eric T-M算法:10.22s

(对于基准测试,我删除了Alain T.代码中的排序,以使其具有可比性。)


Tags: sample算法元素bynprandom数组tic
3条回答

这可以用动态规划来解决

# Helper function to get out of the data structure.
def get_nested_array (data, path):
    for x in path:
        if data is None or len(data) <= x:
            return None
        else:
            data = data[x]
    return data

# Helper function to set data in the data structure.
def set_nested_array (data, path, value):
    # Navigate there
    for x in path[0:len(path)-1]:
        while len(data) <= x:
            data.append([])
        if data[x] is None:
            data[x] = []
        data = data[x]
    while len(data) <= path[-1]:
        data.append(None)
    data[path[-1]] = value

# Is this option better than what is there?  If so, then add it.
def possibly_add_choice (best_choice, pos, i, j, current_sum, last_i, last_j):
    best_prev = get_nested_array(best_choice, [pos, i, j])
    if best_prev is None or best_prev[0] < current_sum:
        set_nested_array(best_choice, [pos, i, j], (current_sum, last_i, last_j))

# Our program.
def optimal_choices (k, A, B, C):
    # best_choice[pos][i][j] = (max_sum, last_a, last_b)
    # where:
    #    We have made all choices in the range 0..pos-1
    #    We chose i out of A
    #    We chose j out of B
    # and
    #    max_sum is the best possible sum
    #    last_a is the last index chosen from a
    #    last_b is the last index chosen from b
    # then we can find the answer working backwards from
    # best_choice[len(A)][k][k]
    #
    best_choice = []

    # Enter the empty set answer
    set_nested_array(best_choice, [0, 0, 0], (0, None, None))

    for pos in range(len(A)):
        best_choice.append([])
        best_choice_for_pos = best_choice[pos]
        for i in range(k+1):
            if len(best_choice_for_pos) <= i:
                break
            best_choice_for_i = best_choice_for_pos[i]
            for j in range(k+1):
                if len(best_choice_for_i) <= j:
                    break
                last_sum, last_i, last_j = best_choice_for_i[j]

                # Try all 4 things we can choose here.  Nothing, or A or B or both.
                possibly_add_choice(best_choice, pos+1, i, j, last_sum, last_i, last_j)
                possibly_add_choice(best_choice, pos+1, i+1, j, last_sum + A[pos], pos, last_j)
                possibly_add_choice(best_choice, pos+1, i, j+1, last_sum + B[pos], last_i, pos)
                possibly_add_choice(best_choice, pos+1, i+1, j+1, last_sum + C[pos], pos, pos)

    # Now we have the answer, it is just a question of decoding it.
    if get_nested_array(best_choice, [len(A), k, k]) is None:
        return (None, None)
    else:
        choose_a = []
        choose_b = []
        best_spot = [len(A), k, k]
        max_sum, last_i, last_j = get_nested_array(best_choice, best_spot)
        while last_i is not None or last_j is not None:
            # Figure out where we last had a choice and what was chosen.
            if last_i is None:
                last_pos = last_j
                i_dec = 0
                j_dec = 1
            elif last_j is None:
                last_pos = last_i
                i_dec = 1
                j_dec = 0
            else:
                last_pos = max(last_i, last_j)
                i_dec = 0
                j_dec = 0
                if last_pos == last_i:
                    i_dec = 1
                if last_pos == last_j:
                    j_dec = 1

            # record the choice.
            if 1 == i_dec:
                choose_a.append(last_pos)
            if 1 == j_dec:
                choose_b.append(last_pos)

            # Go back to that spot
            max_sum, last_i, last_j = get_nested_array(best_choice, [last_pos, k-len(choose_a), k-len(choose_b)])

        # We walked backwards to generate these lists.  So they are currently reversed.
        return (list(reversed(choose_a)), list(reversed(choose_b)))

print(optimal_choices(2, [3, 1, 1, 0 ], [1, 3, 0, 1], [4, 4, 3, 2]))

如果将A和B展开为一个索引对列表,其中包含各自的和(应用C中的异常),则可以迭代地获取最大和,并在每个步骤中排除相应的对。这将在每次迭代时从剩余对中选择可能的最高总数:

def maxSum(A,B,C,K):
    S = [ ([a+b,C[i]][i==j],i,j) for i,a in enumerate(A)  
                                 for j,b in enumerate(B)]
    usedA,usedB  = set(),set()
    for _ in range(K):
        _,i,j = max(s for s in S if not(s[1] in usedA or s[2] in usedB))
        usedA.add(i)
        usedB.add(j)
    return sorted(usedA),sorted(usedB)

输出:

A = [3, 1, 1, 0]   
B = [1, 3, 0, 1]  
C = [4, 4, 3, 2]
print(maxSum(A,B,C,2)) # ([0, 2], [1, 2])   total is 9

A = [1,2,3,4]
B = [4,5,6,2]
C = [5,9,9,6]

print(maxSum(A,B,C,2)) # ([1, 3], [1, 2])  total is 19

print(maxSum(A,B,C,3)) # ([1, 2, 3], [0, 1, 2]) total is 26

试试这个。它需要O(N^2)时间,而且相当简单

def maxPicks(A,B,C,k):
    # returns the tuple (list of entries picked in A, list of entries picked in B, total value)

    # BASE CASE
    if k == 0:
        return ([], [], 0)
    aMax = max(A)
    bMax = max(B)
    cMax = max(C)

    if (aMax + bMax) > cMax:
        aIdx = A.index(aMax)
        bIdx = B.index(bMax)
        B[aIdx] = C[aIdx] - A[aIdx]
        A[aIdx] = -2
        C[aIdx] = -1
        A[bIdx] = C[bIdx] - B[bIdx]
        B[bIdx] = -2
        C[bIdx] = -1
        nextPicks = maxPicks(A,B,C,k-1)
        return (nextPicks[0] + [aIdx], nextPicks[1] + [bIdx], nextPicks[2] + aMax + bMax)
    else:
        cIdx = C.index(cMax)
        A[cIdx] = -1
        B[cIdx] = -1
        C[cIdx] = -1
        nextPicks = maxPicks(A,B,C,k-1)
        return (nextPicks[0] + [cIdx], nextPicks[1] + [cIdx], nextPicks[2] + cMax)

下面是它的工作原理:

基本情况应该是不言自明的。否则,我们将比较A中所有条目的最大值和B中所有条目的最大值与C中所有条目的最大值之和。如果此总和大于从AB中选取这些条目的安全值,但在进行更多选取之前,我们需要将选取的条目以及它们在C中的对应条目设置为负值。作为旁注,我确实假设a、B和C中的所有值最初都是非负的,因此通过将它们设置为负,我们禁止我们的算法再次拾取它们。如果这个假设是错误的,您可能希望将这些值设置为极端负的值,以禁止重复拾取。我们还看到,如果我们选择了A[i],那么B[i]的值现在是C[i]-A[i]的值,因为选择B[i]将丢失A[i]中的值,如果我们选择B[j],则为条目A[j]提供相同的C[i]中的值

另一方面,如果C中的最大条目大于或等于aMax+bMax,我们希望选择它(通过选择AB中的相应条目,因为AB中的其他条目选择或仅仅是^}将更有价值。此时我们知道我们不想重新选择A[i],B[i],或C[i],所以我们将它们都设置为负值

相关问题 更多 >