共享公共元素的合并列表

2024-05-15 14:43:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我的输入是一个列表列表。他们中有些人有共同的因素

L = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]

我需要合并所有共享一个公共元素的列表,并重复此过程,只要不再有具有相同项的列表。我考虑过使用布尔运算和while循环,但没有找到一个好的解决方案。

最终结果应该是:

L = [['a','b','c','d','e','f','g','o','p'],['k']] 

Tags: 元素列表过程解决方案因素while
3条回答

您可以将列表看作是一个图的符号,即['a','b','c']是一个有3个节点相互连接的图。您试图解决的问题是找到connected components in this graph

您可以使用NetworkX来实现这一点,它的优点是几乎可以保证是正确的:

l = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]

import networkx 
from networkx.algorithms.components.connected import connected_components


def to_graph(l):
    G = networkx.Graph()
    for part in l:
        # each sublist is a bunch of nodes
        G.add_nodes_from(part)
        # it also imlies a number of edges:
        G.add_edges_from(to_edges(part))
    return G

def to_edges(l):
    """ 
        treat `l` as a Graph and returns it's edges 
        to_edges(['a','b','c','d']) -> [(a,b), (b,c),(c,d)]
    """
    it = iter(l)
    last = next(it)

    for current in it:
        yield last, current
        last = current    

G = to_graph(l)
print connected_components(G)
# prints [['a', 'c', 'b', 'e', 'd', 'g', 'f', 'o', 'p'], ['k']]

要想自己有效地解决这个问题,你必须把列表转换成类似于图表的东西,所以你最好从一开始就使用networkX。

我遇到了一个相同的问题,那就是试图合并具有共同值的列表。这个例子可能就是你想要的。 它只循环列表一次,并在运行时更新resultset。

lists = [['a','b','c'],['b','d','e'],['k'],['o','p'],['e','f'],['p','a'],['d','g']]
lists = sorted([sorted(x) for x in lists]) #Sorts lists in place so you dont miss things. Trust me, needs to be done.

resultslist = [] #Create the empty result list.

if len(lists) >= 1: # If your list is empty then you dont need to do anything.
    resultlist = [lists[0]] #Add the first item to your resultset
    if len(lists) > 1: #If there is only one list in your list then you dont need to do anything.
        for l in lists[1:]: #Loop through lists starting at list 1
            listset = set(l) #Turn you list into a set
            merged = False #Trigger
            for index in range(len(resultlist)): #Use indexes of the list for speed.
                rset = set(resultlist[index]) #Get list from you resultset as a set
                if len(listset & rset) != 0: #If listset and rset have a common value then the len will be greater than 1
                    resultlist[index] = list(listset | rset) #Update the resultlist with the updated union of listset and rset
                    merged = True #Turn trigger to True
                    break #Because you found a match there is no need to continue the for loop.
            if not merged: #If there was no match then add the list to the resultset, so it doesnt get left out.
                resultlist.append(l)
print resultlist

#

resultset = [['a', 'b', 'c', 'd', 'e', 'g', 'f', 'o', 'p'], ['k']]

算法:

  1. 从列表中获取第一组
  2. 对于列表中的其他集合B,如果B有公共元素,则将B连接到A中;从列表中移除B
  3. 重复2。直到不再与
  4. 投入产出
  5. 重复1。与其他列表一起

所以你可能想用集合而不是列表。下面的程序应该可以做到这一点。

l = [['a', 'b', 'c'], ['b', 'd', 'e'], ['k'], ['o', 'p'], ['e', 'f'], ['p', 'a'], ['d', 'g']]

out = []
while len(l)>0:
    first, *rest = l
    first = set(first)

    lf = -1
    while len(first)>lf:
        lf = len(first)

        rest2 = []
        for r in rest:
            if len(first.intersection(set(r)))>0:
                first |= set(r)
            else:
                rest2.append(r)     
        rest = rest2

    out.append(first)
    l = rest

print(out)

相关问题 更多 >