python heapq排序列表错误?

2024-05-16 07:16:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将列表排序为一个列表,其中包含节、子节和子节的编号和名称。程序如下所示:

import heapq

sections = ['1. Section', '2. Section', '3. Section', '4. Section', '5. Section', '6. Section', '7. Section', '8. Section', '9. Section', '10. Section', '11. Section', '12. Section']
subsections = ['1.1 Subsection', '1.2 Subsection', '1.3 Subsection', '1.4 Subsection', '2.1 Subsection', '4.1 My subsection', '7.1 Subsection', '8.1 Subsection', '12.1 Subsection']
subsubsections = ['1.2.1 Subsubsection', '1.2.2 Subsubsection', '1.4.1 Subsubsection', '2.1.1 Subsubsection', '7.1.1 Subsubsection', '8.1.1 Subsubsection', '12.1.1 Subsubsection']

sorted_list = list(heapq.merge(sections, subsections, subsubsections))

print(sorted_list)

我要说的是:

^{pr2}$

我的第12小节和小节位于第8小节,而不是第12小节。在

为什么会这样?最初的名单被排序,一切顺利,显然排到了第10位。在

我不知道为什么会发生这种情况,有没有一种方法可以更好地根据列表中的数字将其分类为“树”?我正在构建一个目录,它将返回(一旦我过滤掉列表)

1. Section
    1.1 Subsection
    1.2 Subsection
        1.2.1 Subsubsection
        1.2.2 Subsubsection
    1.3 Subsection
    1.4 Subsection
        1.4.1 Subsubsection
2. Section
    2.1 Subsection
        2.1.1 Subsubsection
3. Section
4. Section
    4.1 My subsection
5. Section
6. Section
7. Section
    7.1 Subsection
        7.1.1 Subsubsection
8. Section
    8.1 Subsection
    12.1 Subsection
        8.1.1 Subsubsection
        12.1.1 Subsubsection
9. Section
10. Section
11. Section
12. Section

注意8.1小节后面的12.1小节和8.1.1小节后面的12.1.1小节。在


Tags: 列表排序mysectionlistsortedheapqsubsection
2条回答

正如在其他答案中所解释的,您必须指定一个排序方法,否则python将按字典顺序对字符串进行排序。如果您使用的是python3.5+,那么可以在merge函数中使用key参数,在python3.5中,可以使用itertools.chain和{},作为一种通用方法,您可以使用regex来查找数字并将其转换为int:

In [18]: from itertools import chain
In [19]: import re
In [23]: sorted(chain.from_iterable((sections, subsections, subsubsections)),
                key = lambda x: [int(i) for i in re.findall(r'\d+', x)])
Out[23]: 
['1. Section',
 '1.1 Subsection',
 '1.2 Subsection',
 '1.2.1 Subsubsection',
 '1.2.2 Subsubsection',
 '1.3 Subsection',
 '1.4 Subsection',
 '1.4.1 Subsubsection',
 '2. Section',
 '2.1 Subsection',
 '2.1.1 Subsubsection',
 '3. Section',
 '4. Section',
 '4.1 My subsection',
 '5. Section',
 '6. Section',
 '7. Section',
 '7.1 Subsection',
 '7.1.1 Subsubsection',
 '8. Section',
 '8.1 Subsection',
 '8.1.1 Subsubsection',
 '9. Section',
 '10. Section',
 '11. Section',
 '12. Section',
 '12.1 Subsection',
 '12.1.1 Subsubsection']

你的列表在人眼看来可能是排序的。但是对于Python来说,您的输入并不是完全排序的,因为它按字典顺序对字符串进行排序。这意味着'12'按排序顺序在'8'之前,因为只比较前几个字符。在

因此,合并是完全正确的;在看到'8.1'字符串之后会遇到以'12.1'开头的字符串,但以'8.1.1'开头的字符串将在之后排序。在

必须使用键函数从字符串中提取整数元组才能正确排序:

section = lambda s: [int(d) for d in s.partition(' ')[0].split('.') if d]
heapq.merge(sections, subsections, subsubsections, key=section))

请注意,key参数仅在python3.5及更高版本中可用;在早期版本中,您必须手动执行decorate merge undecorate dance。在

演示(使用Python3.6):

^{pr2}$

keyed merge很容易后移植到Python 3.3和3.4:

import heapq

def _heappop_max(heap):
    lastelt = heap.pop()
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        heapq._siftup_max(heap, 0)
        return returnitem
    return lastelt

def _heapreplace_max(heap, item):
    returnitem = heap[0]
    heap[0] = item
    heapq._siftup_max(heap, 0)
    return returnitem

def merge(*iterables, key=None, reverse=False):    
    h = []
    h_append = h.append

    if reverse:
        _heapify = heapq._heapify_max
        _heappop = _heappop_max
        _heapreplace = _heapreplace_max
        direction = -1
    else:
        _heapify = heapify
        _heappop = heappop
        _heapreplace = heapreplace
        direction = 1

    if key is None:
        for order, it in enumerate(map(iter, iterables)):
            try:
                next = it.__next__
                h_append([next(), order * direction, next])
            except StopIteration:
                pass
        _heapify(h)
        while len(h) > 1:
            try:
                while True:
                    value, order, next = s = h[0]
                    yield value
                    s[0] = next()           # raises StopIteration when exhausted
                    _heapreplace(h, s)      # restore heap condition
            except StopIteration:
                _heappop(h)                 # remove empty iterator
        if h:
            # fast case when only a single iterator remains
            value, order, next = h[0]
            yield value
            yield from next.__self__
        return

    for order, it in enumerate(map(iter, iterables)):
        try:
            next = it.__next__
            value = next()
            h_append([key(value), order * direction, value, next])
        except StopIteration:
            pass
    _heapify(h)
    while len(h) > 1:
        try:
            while True:
                key_value, order, value, next = s = h[0]
                yield value
                value = next()
                s[0] = key(value)
                s[2] = value
                _heapreplace(h, s)
        except StopIteration:
            _heappop(h)
    if h:
        key_value, order, value, next = h[0]
        yield value
        yield from next.__self__

装饰-排序-取消装饰合并非常简单:

def decorate(iterable, key):
    for elem in iterable:
        yield key(elem), elem

sorted = [v for k, v in heapq.merge(
    decorate(sections, section), decorate(subsections, section)
    decorate(subsubsections, section))]

因为您的输入已经排序,所以使用合并排序更有效。最后,您可以使用sorted()但是:

from itertools import chain
result = sorted(chain(sections, subsections, subsubsections), key=section)

相关问题 更多 >