二分查找 + 线性查找以找出多个匹配项。如何返回结果元组？（Python）

Question

我写了一个搜索算法，可以在一个排好序的列表中找到一个字符串，然后再检查它两边的条目，看看有没有重复的。

import re

found = []
missing = []

def find_media(media, drive_inv):
    """
    media is a string.
    drive_inv is a list of strings.
    use binary search to find a match,
    followed by a linear seach either side
    to check for duplicates.
    append a match to the global list, found.
    else append to the global list, missing.
    """

    def linear_search_up(media, line):
        """ line is an int, to index drive_inv with. """

        try:
            if re.search(media, drive_inv[line+1], re.IGNORECASE):
                found.append(drive_inv[line+1])
                return linear_search_up(media, line+1)
            else:
                return 

        except IndexError:
            return 

    def linear_search_down(media, line):
        """ line is an int, to index drive_inv with. """
        try:
            if re.search(media, drive_inv[line-1], re.IGNORECASE):
                found.append(drive_inv[line-1])
                return linear_search_down(media, line-1)
            else:
                return 

        except IndexError:
            return 

    def binary_search(media, low, high):
        """
        low and high are ints - the boundries of the
        binary search algorithm.
        if a match is found, execute the linear seach
        function on the entries either side.
        """

        if high == low:
            if re.search(media, drive_inv[low], re.IGNORECASE):
                found.append(drive_inv[low])
                return
            else:
                missing.append(media)
                return

        mid = (low + high) / 2

        if re.search(media, drive_inv[mid], re.IGNORECASE):
            found.append(drive_inv[mid])

#           now check the entries either side
            return (
                    linear_search_up(media, mid),
                    linear_search_down(media, mid
            )

#       if the filename > media, discard the larger entries
        elif drive_inv[mid].split('/')[-1] > media:
            if low == mid:
                missing.append(media)
                return
            else:
                return binary_search(media, low, mid-1)

#       if the filename < media, discard the smaller entries
        else:
            return binary_search(media, mid+1, high)

    if len(drive_inv) == 0:
        return
    else:
        return binary_search(media, 0, len(drive_inv)-1)

这个算法似乎运行得不错，但有点儿麻烦，因为它把结果添加到了全局列表里。我希望它能返回一个包含所有匹配项的元组。不过，如果我把：

found.append(drive_inv[line+1])
return linear_search_up(media, line+1)

改成：

return (
        drive_inv[line+1],
        linear_search_up(media, line+1)
)

我得到的元组看起来像：

(('A001C002', ('A001C002', None)), ('A001C002', ('A001C002', ('A001C002', ('A001C002', None)))))

...这样就没什么用了。

这个算法能重新写一下，还能用递归吗？还是说我应该考虑其他方法？

元组递归搜索算法二分查找排序列表重复项线性查找

二分查找 + 线性查找以找出多个匹配项。如何返回结果元组？（Python）

1 个回答

撰写回答