我的算法运行时间复杂度 - 如何计算并进一步优化它？

Question

我设计了一个递归算法，并用Python写了出来。当我用不同的参数测量运行时间时，发现它的运行时间似乎是指数级的。而且，即使是像50这样的小数字，它也要花超过半个小时才能结束。（我没有等到它完成，但看起来在合理的时间内是无法完成的，我猜是指数级的）。

所以，我对这个算法的运行时间复杂度很感兴趣。有人能帮我推导出方程T(n,m)吗？或者计算一下大O符号吗？

算法如下：

# parameters:
# search string, the index where we left on the search string, source string, index where we left on the source string,
# and the indexes array, which keeps track of the indexes found for the characters
def find(search, searchIndex, source, sourceIndex, indexes):
    found = None
    if searchIndex < len(search): # if we haven't reached the end of the search string yet
        found = False
        while sourceIndex < len(source): # loop thru the source, from where we left off
            if search[searchIndex] == source[sourceIndex]: # if there is a character match
                # recursively look for the next character of search string 
                # to see if it can be found in the remaining part of the source string
                if find(search, searchIndex + 1, source, sourceIndex + 1, indexes):
                    # we have found it
                    found = True # set found = true
                    # if an index for the character in search string has never been found before.
                    # i.e if this is the first time we are finding a place for that current character
                    if indexes[searchIndex] is None:
                        indexes[searchIndex] = sourceIndex # set the index where a match is found
                    # otherwise, if an index has been set before but it's different from what
                    # we are trying to set right now. so that character can be at multiple places.
                    elif indexes[searchIndex] != sourceIndex: 
                        indexes[searchIndex] = -1 # then set it to -1.
            # increment sourceIndex at each iteration so as to look for the remaining part of the source string. 
            sourceIndex = sourceIndex + 1
    return found if found is not None else True

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    # so in this example where N=7, allcards=['B','R','R','B','R','B','R']
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]
    # indexes is initially None.
    indexes = [None] * len(colors)

    find(colors, 0, allcards, 0, indexes)
    return indexes    

if __name__ == "__main__":
    print theCards(7, list("BBB"))

我不知道是否需要理解问题和算法才能推导出最坏情况下的运行时间，但这是我尝试解决的问题：

问题：

给定一个源字符串SRC和一个搜索字符串SEA，找到在SRC中出现的子序列SEA，并返回每个字符在SRC中找到的位置索引。如果SEA中的某个字符在SRC中可以出现多个位置，就返回-1作为该字符的位置。

举个例子；如果源字符串是BRRBRBR（N=7），而搜索字符串是BBB：那么'BBB'中的第一个'B'可以出现在源字符串的索引0。第二个'B'可以在索引3，而最后一个'B'可以在索引5。此外，'BBB'字符的位置没有其他选择，因此算法返回[0,3,5]。

再举一个例子，如果源字符串是BRRBRB（N=6），而搜索字符串是RBR：'RBR'中的第一个'R'可以在位置1或2。这就只留下位置3给'B'，位置4给最后一个'R'。所以，第一个'R'可以出现在多个地方，它的位置是不确定的。其他两个字符'B'和'R'只有一个位置。因此算法返回[-1,4,5]。

算法无法完成并且永远运行下去的情况是，当源字符串是['B', 'R', 'R', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'R', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B', 'R', 'B', 'B', 'B', 'B', 'B']（N=58），而搜索字符串是RBRRBRBBRBRRBBRRBBBRRBBBRR。它应该返回[-1, -1, -1, -1, -1, -1, -1, -1, 17, 18, 19, 23, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 47, 53]，但不幸的是，它没有完成 =(

优化：

我考虑过在'indexes'列表完全填满-1时停止搜索。但这只影响最好的情况（或者可能是平均情况），而不影响最坏情况。还有什么方法可以进一步优化这个算法呢？我知道这个问题存在多项式解决方案。

比起优化，我更好奇的是运行时间的T(n,m)方程，其中n和m是源字符串和搜索字符串的长度。

如果你能看到这里，非常感谢你！ =)

编辑 - IVIad的解决方案实现：

def find2(search, source):
    indexes = list()
    last = 0
    for ch in search:
        if last >= len(source):
            break
        while last < len(source) and source[last] != ch:
            last = last + 1
        indexes.append(last)
        last = last + 1
    return indexes

def theCards(N, colors):
    # allcards: a list 1..N of characters where allcards[i] is 'R' if i is a prime number, 'B' otherwise.
    allcards = ['R' if isPrime(i) else 'B' for i in range(1, N + 1)]

    indexes = find2(colors, allcards) # find the indexes of the first occurrences of the characters
    colors.reverse() # now reverse both strings
    allcards.reverse()
    # and find the indexes of the first occurrences of the characters, again, but in reversed order
    indexesreversed = find2(colors, allcards)
    indexesreversed.reverse() # reverse back the resulting list of indexes 
    indexesreversed = [len(allcards) - i - 1 for i in indexesreversed] # fix the indices

    # return -1 if the indices are different when strings are reversed
    return [indexes[i] + 1 if indexes[i] == indexesreversed[i] else - 1 for i in range(0, len(indexes))]

if __name__ == "__main__":
    print theCards(495, list("RBRRBRBBRBRRBBRRBBBRRBBBRR"))

时间复杂度递归算法优化算法算法复杂度大O符号最坏情况分析子序列查找指数级复杂度

我的算法运行时间复杂度 - 如何计算并进一步优化它？

3 个回答

撰写回答