查找字符串序列中的空缺

7 投票

4 回答

3204 浏览

提问于 2025-04-16 07:38

我有一串字符串，比如 0000001, 0000002, 0000003.... 一直到200万。这些字符串并不是连续的，也就是说中间有空缺。比如在 0000003 之后，下一串可能是 0000006。我需要找出所有这些空缺的部分。在这个例子中，空缺的就是 0000004 和 0000005。

这是我到目前为止所做的事情 -

gaps  = list()
total = len(curr_ids)

for i in range(total):
    tmp_id = '%s' %(str(i).zfill(7))
    if tmp_id in curr_ids:
        continue
    else:
        gaps.append(tmp_id)
return gaps

不过，正如你所猜到的，这个方法比较慢，因为我使用的是 list。如果我用 dict 来提前填充 curr_ids，那会快一些。但是，填充哈希表的复杂度是什么呢？有没有更快的方法来做到这一点？

数据结构字符串处理哈希表算法优化性能提升空缺查找

4 个回答

seq = *the sequence of strings*
n = 2000000

gaps = set(str(i).zfill(7) for i in range(1,n+1)) - set(seq)

当然可以！请把你想要翻译的内容发给我，我会帮你用简单易懂的语言解释清楚。

回答于 2025-04-16 由 Python大师

分享举报

如果你想存储200万个整数，可以使用bitarray这个工具。在这里，每一个比特位（bit）代表一个整数，也就是说，bitarray中某个位置的值就是对应的整数。下面是一个示例代码：

gaps = []
# bitarray is 0 based
a = bitarray.bitarray(total + 1)
a.setall(False)
for sid in curr_ids:
    a[int(sid)] = True
for i in range(1, total):
    if not a[i]:
        gaps.append('%07d' %(i))
return gaps

回答于 2025-04-16 由 Python大师

分享举报

你可以先把这些id排个序，然后只需要遍历一遍就可以了：

def find_gaps(ids):
    """Generate the gaps in the list of ids."""
    j = 1
    for id_i in sorted(ids):
        while True:
            id_j = '%07d' % j
            j += 1
            if id_j >= id_i:
                break
            yield id_j

>>> list(find_gaps(["0000001", "0000003", "0000006"]))
['0000002', '0000004', '0000005']

如果输入的列表已经排好序了，那你就可以省去 sorted 这一步（不过用它也没什么坏处：如果列表已经排好序，Python的自适应归并排序的效率是O(n)）。

回答于 2025-04-16 由 Python大师

分享举报

查找字符串序列中的空缺

4 个回答

撰写回答