Python算法挑战？

5 投票

8 回答

548 浏览

提问于 2025-04-16 21:57

我有一个 python 函数（叫它 myFunction），这个函数的输入是一个数字列表，经过复杂的计算后，它会返回一个数字作为结果。

这个函数的样子是这样的：

def myFunction( listNumbers ):
    # initialize the result of the calculation
    calcResult = 0

    # looping through all indices, from 0 to the last one
    for i in xrange(0, len(listNumbers), 1):
        # some complex calculation goes here, changing the value of 'calcResult'

    # let us now return the result of the calculation
    return calcResult

我测试过这个函数，它的表现符合预期。

通常情况下，myFunction 会接收一个名为 listNumbers 的参数，这个参数里有 5,000,000 个元素。正如你所想的，计算需要一些时间。我希望这个函数能尽可能快地运行。

现在来了一个挑战：假设现在是早上5点，而 listNumbers 里只有 4,999,999 个值。也就是说，它的最后一个值 还没有准备好。这个值要到早上6点才会有。

显然，我们可以这样做（第一种方式）：等到早上6点，然后把最后一个值加到 listNumbers 里，再运行 myFunction。这个方法是可行的，但会花费一些时间，因为 myFunction 需要处理 整个数字列表，从第一个元素开始。记住，我们的目标是尽快在6点后得到结果。

我在想有没有更有效的方法来解决这个问题（第二种方式）：因为在早上5点时，我们已经有了 listNumbers 里 4,999,999 个值，我们可以立刻开始运行 myFunction。我们可以处理 我们能处理的部分（记住，我们还没有最后一条数据），然后——正好在6点——把新数据“插入”进来，生成计算结果。这应该会快很多，因为 大部分处理 会在6点之前完成，因此我们只需要处理新数据——这意味着计算结果应该在 6点后立刻可用。

假设我们无法查看或修改 myFunction 的代码。有没有任何编程技巧或设计思路，可以让我们在不改变代码的情况下，使用 myFunction 原样，使它以第二种方式运行，而不是第一种方式？

请不要建议使用 c++ / numpy + cython / 并行计算 等方法来解决这个问题。这里的目标是看看是否有任何 编程技巧 或 设计模式 可以轻松用于解决此类问题。

实时计算并发编程设计模式数据处理函数设计算法优化性能提升计算复杂度

8 个回答

子类列表，这样当函数尝试读取最后一个值时，它会阻塞，直到另一个线程提供这个值。

import threading
import time

class lastblocks(list):
    def __init__(self,*args,**kwargs):
        list.__init__(self,*args,**kwargs)
        self.e = threading.Event()
    def __getitem__(self, index):
        v1 = list.__getitem__(self,index)
        if index == len(self)-1:
            self.e.wait()
            v2 = list.__getitem__(self,index)
            return v2
        else:
            return v1


l = lastblocks(range(5000000-1)+[None])

def reader(l):
    s = 0
    for i in xrange(len(l)):
        s += l[i]
    print s

def writer(l):
    time.sleep(10)
    l[5000000-1]=5000000-1
    l.e.set()
    print "written"

reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()

打印：

written
12499997500000

对于numpy：

import threading
import time

import numpy as np

class lastblocks(np.ndarray):
    def __new__(cls, arry):
        obj = np.asarray(arry).view(cls)
        obj.e = threading.Event()
        return obj
    def __array_finalize__(self, obj):
        if obj is None: return
        self.e = getattr(obj, 'e', None)

    def __getitem__(self, index):
        v1 = np.ndarray.__getitem__(self,index)
        if index == len(self)-1:
            self.e.wait()
            v2 = np.ndarray.__getitem__(self,index)
            return v2
        else:
            return v1


l = lastblocks(np.asarray(range(5000000-1)+[None]))

def reader(l):
    s = 0
    for i in xrange(len(l)):
        s += l[i]
    print s

def writer(l):
    time.sleep(10)
    l[5000000-1]=5000000-1
    l.e.set()
    print "written"

reader = threading.Thread(target=reader, args=(l,))
writer = threading.Thread(target=writer, args=(l,))
reader.start()
writer.start()

回答于 2025-04-16 由 Python大师

分享举报

你说的“数字列表”，是指Python里那种内置的list类型吗？

如果不是，那就简单了。Python使用一种叫做鸭子类型的方式，所以只要传入任何可以被迭代的序列就可以了。你可以用yield这个关键字来传递一个生成器。
```
def delayed_list():
    for val in numpy_array[:4999999]:
        yield val
    wait_until_6am()
    yield numpy_array[4999999]
```

然后，

    myFunction(delayed_list())

如果是的话，那就复杂一些了 :)

另外，可以看看PEP8，里面有推荐的Python代码风格：

括号周围不要留空格
用my_function而不是myFunction
用for i, val in enumerate(numbers):而不是for i in xrange(0, len(listNumbers), 1):等等。

回答于 2025-04-16 由 Python大师

分享举报

你可以使用一个生成器作为输入。这个生成器只有在有数据可以处理的时候才会返回结果。

更新：感谢那个很棒的评论，我想把这个内容删掉 :)

class lazylist(object):
    def __init__(self):
        self.cnt = 0
        self.length = 5000000

    def __iter__(self):
        return self

    def __len__(self):
        return self.length

    def next(self):
        if self.cnt < self.length:
            self.cnt += 1
            #return data here or wait for it
            return self.cnt #just return a counter for this example
        else:
            raise StopIteration()

    def __getitem__(self, i):
        #again, block till you have data.
        return i+1 #simple counter

myFunction(lazylist())

更新：从评论和其他解决方案中可以看出，你的循环结构和len调用会引起很多麻烦。如果你能去掉它们，就可以用更优雅的解决方案。使用for e in li或者enumerate是更符合Python风格的做法。

回答于 2025-04-16 由 Python大师

分享举报

Python算法挑战？

8 个回答

撰写回答