在列表理解中使用next

1条回答

网友

1楼 · 发布于 2024-06-16 09:32:39

我想我理解你的问题

问题描述

给定：

num_items-可用项目的数量
targets-潜在目标的列表，每个目标都有一个值
threshold-截止限值

任务：

选择targets的第一个num_items元素，其值大于或等于threshold
从targets（从1开始）返回最后选择的元素的数组索引，如果没有足够的目标可用，则返回0。（奇怪的决定，我会选择从0开始的索引，如果没有找到，则返回len(targets)，但很好）
优化速度targets和num_items每次都是相同的，threshold是唯一更改的值

范例

num_items = 3
targets = [5,3,4,1,3,3,7,4]
threshold = 4

选择的目标将是位于[0,2,6]位置的目标，其值为[5,4,7]，因为这些是高于或等于threshold的第一个3值。我们只搜索最后一个的索引，在本例中是6

接近

你最初的想法是迭代所有的人，如果阈值很低，速度会很快，但是如果阈值较高，速度会很慢，因为我们需要迭代所有的人，直到找到一个候选人

我重写了您最初的想法，对所有这些想法进行了迭代，因为我无法理解您的代码：

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    results = []
    for today_baking_time in required_baking_times:
        results.append(choose_first_n(num_loaves_per_day, people_max_waiting_time, today_baking_time))
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# Returns: [3, 4, 15, 7, 15, 7, 19], as in the original code.
# Also, please provide expected return values in future, like I did here.

使用堆是一个有趣的想法，但我认为我们并没有从中受益。堆只有在项目移除/插入时才会非常快，而我们不会这样做。我们只是重复它们

我能想到的最快的方法是将threshold列表预处理为更有效的内容，就像创建最后一项的“索引”一样

演示： 我们使用前面的代码，并根据阈值查看结果：

def choose_first_n(num_items, targets, threshold):
    for target_id, target in enumerate(targets):
        if target >= threshold:
            num_items -= 1
            if num_items == 0:
                return target_id + 1
    return 0

targets = [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8]
num_items = 3

for threshold in range (10):
    result = choose_first_n(num_items, targets, threshold)
    print(f"Threshold: {threshold}, Result: {result}")

Threshold: 0, Result: 3
Threshold: 1, Result: 3
Threshold: 2, Result: 4
Threshold: 3, Result: 4
Threshold: 4, Result: 7
Threshold: 5, Result: 15
Threshold: 6, Result: 15
Threshold: 7, Result: 19
Threshold: 8, Result: 19
Threshold: 9, Result: 0

您可以看到，如果阈值升高，结果也会升高。阈值与结果之间呈线性稳定增长关系

如果我们可以计算结果变化的值，我们可以通过分治搜索直接计算结果，这比遍历列表快得多。（O(logn)而不是O(n)，以防您熟悉大O符号）

这里需要注意的一点是，最后一个结果是0，它阻止了该方案。这就是为什么让索引从0开始而不是从1开始是有益的，并且让“error”案例是len(targets)而不是0

预处理

最困难的事情是获得映射的预处理

让我们从另一个角度来看

为了简单起见，假设num_items为3，我们有10个目标。选定的目标是否在前5个目标范围内

答案是：是的，如果前5个目标中至少有3个高于或等于阈值。换句话说，排名第三的数字是决定因素。如果阈值高于第三大数字，则所选目标将不仅在前5个目标内

因此，对于所有项目，我们需要计算第三大数字。有趣的是，这实际上是堆派上用场的地方；）

实施

import heapq
import bisect

def preprocess(targets, num_items):
    # our heap, will contain the first num_items smallest targets
    largest_targets_heap = []

    # Our first preprocessing result, will contain the
    # third large number between the first item and the current item,
    # for every item.
    third_largest_number_per_target = []

    # Compute the third largest previous value for every target
    for target in targets:
        heapq.heappush(largest_targets_heap, target)
        if len(largest_targets_heap) > num_items:
            heapq.heappop(largest_targets_heap)

        current_third_largest = largest_targets_heap[0]
        third_largest_number_per_target.append(current_third_largest)

    # We now have the third largest number for every target.
    # Now, consolidate that data into a lookup table, to prevent duplication.
    # Therefore, find the first occurrence of every number
    lookup_table_indices = []
    lookup_table_values = []
    current_value = third_largest_number_per_target[num_items - 1]

    # Push the (num_items-1)th value to account for the fact our heap wasn't filled up until the
    # first num_items were processed
    lookup_table_indices.append(num_items - 1)
    lookup_table_values.append(current_value)

    # Fill the rest of the lookup table
    for index, value in enumerate(third_largest_number_per_target):
        if index < num_items - 1:
            continue
        if value != current_value:
            lookup_table_indices.append(index)
            lookup_table_values.append(value)
            current_value = value

    # The lookup table we have, consisting of values, indices, a minimum and a maximum value
    lookup_table = (lookup_table_values, lookup_table_indices, num_items, len(targets))

    return lookup_table

def choose_first_n_preprocessed(lookup_table, threshold):
    (lookup_table_values, lookup_table_indices, min_value, max_value) = lookup_table

    # We need to find the first (value,index) pair in lookup table where value is larger or equal to threshold
    # We do this by using bisect, which is really fast. This is only possible because of our preprocessing.
    position = bisect.bisect_left(lookup_table_values, threshold)

    # If we didn't find a result in the preprocessed table, we return the max value, to indicate that the
    # threshold ist too high.
    if position >= len(lookup_table_indices):
        return max_value

    # Read the result from the table of incides
    value = lookup_table_indices[position]
    return value

def baker_queue(num_loaves_per_day, people_max_waiting_time, required_baking_times):
    # Create the preprocessed lookup table
    lookup_table = preprocess(people_max_waiting_time, num_loaves_per_day)

    # For every day, compute the result
    results = []
    for today_baking_time in required_baking_times:
        # Use our fast lookup based algorithm now
        result = choose_first_n_preprocessed(lookup_table, today_baking_time)
        
        # Convert indices back to starting with 1, and 0 in error case, as
        # the original format was
        if result == len(people_max_waiting_time):
            results.append(0)
        else:
            results.append(result+1)
    return results

print(baker_queue(3,
                  [1, 4, 4, 3, 1, 2, 6, 1, 9, 4, 4, 3, 1, 2, 6, 9, 4, 5, 8],
                  [1, 2, 5, 4, 5, 4, 7]))
# [3, 4, 15, 7, 15, 7, 19]

理论分析

现在应该要快得多，特别是对很多天来说，对很多人来说也是如此

幼稚实现的复杂性令人担忧

O(days * people)

预处理实现的复杂性非常高

O(people * log(bread) + days * log(people))

这听起来没什么不同，但确实如此。它基本上说，如果限制因素是人，那么多少天无关紧要，如果限制因素是天，那么多少人无关紧要

基准测试结果

设置为：

每天900块面包
10,000人
10000天

结果:

天真：2.13秒
预处理：0.012秒

然后，我尝试将算法推到目前为止，它也需要2秒，并得到了这些数字：

每天90000个面包
一百万人
1000000天

我没有在naive算法上运行这些数字，但数学上说这需要大约2000000秒或23天

那花了一段时间，我希望这是值得的；）

我认为这是我迄今为止最大的一篇文章，这是一个非常有趣的任务

我希望你能感激

问候

问题描述

范例

接近

预处理

实施

理论分析

基准测试结果

相关问题更多 >

编程相关推荐

热门问题

热门文章