二分查找 + 线性查找以找出多个匹配项。如何返回结果元组?(Python)
我写了一个搜索算法,可以在一个排好序的列表中找到一个字符串,然后再检查它两边的条目,看看有没有重复的。
import re
found = []
missing = []
def find_media(media, drive_inv):
"""
media is a string.
drive_inv is a list of strings.
use binary search to find a match,
followed by a linear seach either side
to check for duplicates.
append a match to the global list, found.
else append to the global list, missing.
"""
def linear_search_up(media, line):
""" line is an int, to index drive_inv with. """
try:
if re.search(media, drive_inv[line+1], re.IGNORECASE):
found.append(drive_inv[line+1])
return linear_search_up(media, line+1)
else:
return
except IndexError:
return
def linear_search_down(media, line):
""" line is an int, to index drive_inv with. """
try:
if re.search(media, drive_inv[line-1], re.IGNORECASE):
found.append(drive_inv[line-1])
return linear_search_down(media, line-1)
else:
return
except IndexError:
return
def binary_search(media, low, high):
"""
low and high are ints - the boundries of the
binary search algorithm.
if a match is found, execute the linear seach
function on the entries either side.
"""
if high == low:
if re.search(media, drive_inv[low], re.IGNORECASE):
found.append(drive_inv[low])
return
else:
missing.append(media)
return
mid = (low + high) / 2
if re.search(media, drive_inv[mid], re.IGNORECASE):
found.append(drive_inv[mid])
# now check the entries either side
return (
linear_search_up(media, mid),
linear_search_down(media, mid
)
# if the filename > media, discard the larger entries
elif drive_inv[mid].split('/')[-1] > media:
if low == mid:
missing.append(media)
return
else:
return binary_search(media, low, mid-1)
# if the filename < media, discard the smaller entries
else:
return binary_search(media, mid+1, high)
if len(drive_inv) == 0:
return
else:
return binary_search(media, 0, len(drive_inv)-1)
这个算法似乎运行得不错,但有点儿麻烦,因为它把结果添加到了全局列表里。我希望它能返回一个包含所有匹配项的元组。不过,如果我把:
found.append(drive_inv[line+1])
return linear_search_up(media, line+1)
改成:
return (
drive_inv[line+1],
linear_search_up(media, line+1)
)
我得到的元组看起来像:
(('A001C002', ('A001C002', None)), ('A001C002', ('A001C002', ('A001C002', ('A001C002', None)))))
...这样就没什么用了。
这个算法能重新写一下,还能用递归吗?还是说我应该考虑其他方法?
1 个回答
0
虽然你可能可以修改你的代码让它实现你想要的功能,但下面的方法也能做到这一点,而且可能更快,而且不需要把字符串列表排序。
from collections import Counter
def find_media(media, drive_inv):
cnt = Counter(drive_inv).get(media, 0)
return (media,)*cnt if cnt else None
drive_inv = ['A001C000', 'A001C000', 'A001C001', 'A001C002', 'A001C002',
'A001C002', 'A001C003', 'A001C003', 'A001C003', 'A001C004',
'A001C005']
print find_media('A001C002', drive_inv) # -> ('A001C002', 'A001C002', 'A001C002')
print find_media('A001C099', drive_inv) # -> None
如果你希望在找不到media
的时候返回一个空的元组,而不是None
,那么可以把函数的返回语句改成:
return (media,)*cnt