二分查找 + 线性查找以找出多个匹配项。如何返回结果元组?(Python)

0 投票
1 回答
514 浏览
提问于 2025-04-18 19:48

我写了一个搜索算法,可以在一个排好序的列表中找到一个字符串,然后再检查它两边的条目,看看有没有重复的。

import re

found = []
missing = []

def find_media(media, drive_inv):
    """
    media is a string.
    drive_inv is a list of strings.
    use binary search to find a match,
    followed by a linear seach either side
    to check for duplicates.
    append a match to the global list, found.
    else append to the global list, missing.
    """

    def linear_search_up(media, line):
        """ line is an int, to index drive_inv with. """

        try:
            if re.search(media, drive_inv[line+1], re.IGNORECASE):
                found.append(drive_inv[line+1])
                return linear_search_up(media, line+1)
            else:
                return 

        except IndexError:
            return 

    def linear_search_down(media, line):
        """ line is an int, to index drive_inv with. """
        try:
            if re.search(media, drive_inv[line-1], re.IGNORECASE):
                found.append(drive_inv[line-1])
                return linear_search_down(media, line-1)
            else:
                return 

        except IndexError:
            return 

    def binary_search(media, low, high):
        """
        low and high are ints - the boundries of the
        binary search algorithm.
        if a match is found, execute the linear seach
        function on the entries either side.
        """

        if high == low:
            if re.search(media, drive_inv[low], re.IGNORECASE):
                found.append(drive_inv[low])
                return
            else:
                missing.append(media)
                return

        mid = (low + high) / 2

        if re.search(media, drive_inv[mid], re.IGNORECASE):
            found.append(drive_inv[mid])

#           now check the entries either side
            return (
                    linear_search_up(media, mid),
                    linear_search_down(media, mid
            )

#       if the filename > media, discard the larger entries
        elif drive_inv[mid].split('/')[-1] > media:
            if low == mid:
                missing.append(media)
                return
            else:
                return binary_search(media, low, mid-1)

#       if the filename < media, discard the smaller entries
        else:
            return binary_search(media, mid+1, high)

    if len(drive_inv) == 0:
        return
    else:
        return binary_search(media, 0, len(drive_inv)-1)

这个算法似乎运行得不错,但有点儿麻烦,因为它把结果添加到了全局列表里。我希望它能返回一个包含所有匹配项的元组。不过,如果我把:

found.append(drive_inv[line+1])
return linear_search_up(media, line+1)

改成:

return (
        drive_inv[line+1],
        linear_search_up(media, line+1)
)

我得到的元组看起来像:

(('A001C002', ('A001C002', None)), ('A001C002', ('A001C002', ('A001C002', ('A001C002', None)))))

...这样就没什么用了。

这个算法能重新写一下,还能用递归吗?还是说我应该考虑其他方法?

1 个回答

0

虽然你可能可以修改你的代码让它实现你想要的功能,但下面的方法也能做到这一点,而且可能更快,而且不需要把字符串列表排序。

from collections import Counter

def find_media(media, drive_inv):
    cnt = Counter(drive_inv).get(media, 0)
    return (media,)*cnt if cnt else None

drive_inv = ['A001C000', 'A001C000', 'A001C001', 'A001C002', 'A001C002',
             'A001C002', 'A001C003', 'A001C003', 'A001C003', 'A001C004',
             'A001C005']

print find_media('A001C002', drive_inv) # -> ('A001C002', 'A001C002', 'A001C002')
print find_media('A001C099', drive_inv) # -> None

如果你希望在找不到media的时候返回一个空的元组,而不是None,那么可以把函数的返回语句改成:

    return (media,)*cnt

撰写回答