Python:如何将列表合并为簇?

5 投票
6 回答
2344 浏览
提问于 2025-04-16 04:19

我有一个包含元组的列表:

[(3,4), (18,27), (4,14)]

我需要写一段代码,把那些有重复数字的元组合并成一个新的列表,这个新列表里的所有元素都应该只包含唯一的数字。而且,这个列表还要按照元组的长度进行排序,也就是说:

>>> MergeThat([(3,4), (18,27), (4,14)])
[(3,4,14), (18,27)]

>>> MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
[(57,66,76,85), (1,3,10), (15,21)]

我知道这有点像层次聚类算法,我之前读过相关的内容,但还是搞不明白。

有没有比较简单的代码可以实现一个叫 MergeThat() 的函数呢?

6 个回答

1

在编程中,有时候我们需要将一些数据从一个地方传到另一个地方。这就像把水从一个杯子倒到另一个杯子一样。为了做到这一点,我们通常会用到一些工具或者方法。

在这个过程中,可能会遇到一些问题,比如数据格式不对,或者数据丢失。这就像你在倒水的时候,水可能会洒出来一样。因此,我们需要确保在传输数据时,能够正确处理这些问题。

有时候,程序会给我们一些错误提示,这些提示就像是一个信号,告诉我们哪里出错了。我们需要仔细阅读这些提示,找出问题所在,然后进行修正。

总之,数据传输是编程中一个很重要的部分,我们需要掌握一些基本的方法和技巧,才能顺利地完成这个过程。

def collapse(L):
    """ The input L is a list that contains tuples of various sizes.
        If any tuples have shared elements, 
        exactly one instance of the shared and unshared elements are merged into the first tuple with a shared element.
        This function returns a new list that contain merged tuples and an int that represents how many merges were performed."""
    answer = []
    merges = 0
    seen = []   # a list of all the numbers that we've seen so far
    for t in L:
        tAdded = False
        for num in t:
            pleaseMerge = True
            if num in seen and pleaseMerge:
                answer += merge(t, answer)
                merges += 1
                pleaseMerge = False
                tAdded= True
            else:
                seen.append(num)
        if not tAdded:
            answer.append(t)

    return (answer, merges)

def merge(t, L):
    """ The input L is a list that contains tuples of various sizes.
        The input t is a tuple that contains an element that is contained in another tuple in L.
        Return a new list that is similar to L but contains the new elements in t added to the tuple with which t has a common element."""
    answer = []
    while L:
        tup = L[0]
        tupAdded = False
        for i in tup:
            if i in t:
                try:
                    L.remove(tup)
                    newTup = set(tup)
                    for i in t:
                        newTup.add(i)
                    answer.append(tuple(newTup))
                    tupAdded = True
                except ValueError:
                    pass
        if not tupAdded:
            L.remove(tup)
            answer.append(tup)
    return answer

def sortByLength(L):
    """ L is a list of n-tuples, where n>0.
        This function will return a list with the same contents as L 
        except that the tuples are sorted in non-ascending order by length"""

    lengths = {}
    for t in L:
        if len(t) in lengths.keys():
            lengths[len(t)].append(t)
        else:
            lengths[len(t)] = [(t)]

    l = lengths.keys()[:]
    l.sort(reverse=True)

    answer = []
    for i in l:
        answer += lengths[i]
    return answer

def MergeThat(L):
    answer, merges = collapse(L)
    while merges:
        answer, merges = collapse(answer)
    return sortByLength(answer)

if __name__ == "__main__":
    print 'starting'
    print MergeThat([(3,4), (18,27), (4,14)])
    # [(3, 4, 14), (18, 27)]
    print MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
    # [(57, 66, 76, 85), (1, 10, 3), (15, 21)]
4
import itertools

def merge_it(lot):
    merged = [ set(x) for x in lot ] # operate on sets only
    finished = False
    while not finished:
        finished = True
        for a, b in itertools.combinations(merged, 2):
            if a & b:
                # we merged in this iteration, we may have to do one more
                finished = False
                if a in merged: merged.remove(a)
                if b in merged: merged.remove(b)    
                merged.append(a.union(b))
                break # don't inflate 'merged' with intermediate results
    return merged

if __name__ == '__main__':
    print merge_it( [(3,4), (18,27), (4,14)] )
    # => [set([18, 27]), set([3, 4, 14])]

    print merge_it( [(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)] )
    # => [set([21, 15]), set([1, 10, 3]), set([57, 66, 76, 85])]

    print merge_it( [(1,2), (2,3), (3,4), (4,5), (5,9)] )
    # => [set([1, 2, 3, 4, 5, 9])]

这里有一段代码示例(包括测试用例):http://gist.github.com/586252

9

我努力想搞明白这个问题,但在尝试了Ian的回答后(谢谢他!),我才意识到理论上的问题是什么:输入是一组边,这些边定义了一个图。我们要找的是这个图的强连通分量。就是这么简单。

虽然你可以高效地做到这一点,但其实没有必要自己去实现这个!只需要导入一个好的图形库就可以了:

import networkx as nx

# one of your examples
g1 = nx.Graph([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
print nx.connected_components(g1) # [[57, 66, 76, 85], [1, 10, 3], [21, 15]]

# my own test case
g2 =  nx.Graph([(1,2),(2,10), (20,3), (3,4), (4,10)])
print nx.connected_components(g2) # [[1, 2, 3, 4, 10, 20]]

撰写回答