如何仅在数组具有公共值时组合它们?

2024-04-28 21:59:58 发布

您现在位置:Python中文网/ 问答频道 /正文

对于包含不同分子的文件,我有许多成对的值(成对的键合原子)。如果两对有一个共同的成员,这意味着他们是同一分子的一部分。我试图在python中找到一种有效的方法,根据原子所属的分子对原子进行分组。你知道吗

例如,乙烷和甲烷为:

1,59是碳,其余是氢

[[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]

我想得到一个列表/数组,其中我有:

[[1,2,3,4,5,6,7,8],[9,10,11,12,13]]

我试过好几种方法,但它们对于包含大量原子的文件来说确实是无效的。应该有个聪明的办法,但我找不到。有什么想法吗?你知道吗

谢谢你, 琼


Tags: 文件方法列表成员数组分子原子办法
3条回答

如果我理解正确的话,你要做的是识别图中的连接成分,其中每个节点是一个原子,每个边是一个键(因此,一个连接成分是一个分子)。在^{}中有一个有效的实现。你知道吗

首先让我们把这个图设为一个稀疏矩阵:

import scipy.sparse as sps

# Input as provided
edges = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]
# Modify the input by adding, for each [x,y], also [y,x].
# Also transform it to a set and then again to a list
# to assure that we don't duplicate anything.
edges = list({(x[0],x[1]) for x in edges}.union({(x[1],x[0]) for x in edges}))
# Create it as a matrix. The weights of all edges are set to 1,
# as they don't matter anyway.
graph = sps.csr_matrix(([1]*len(edges), np.array(edges).T))

此时,只需调用^{},但默认情况下输出的格式略有不同:

(3, array([0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]))

让我们稍微修改一下:

from scipy.sparse import csgraph
connected_components = csgraph.connected_components(graph)
result = []

for u in range(1, connected_components[0]):
    result.append(np.where(connected_components[1]==u)[0])

result

[array([1, 2, 3, 4, 5, 6, 7, 8], dtype=int64),

array([ 9, 10, 11, 12, 13], dtype=int64)]

还要注意,在range中,我是从1开始的,因为Python标准从0开始计数,而这将作为一个孤立的节点出现,因为您是从1开始的。如果原子的编号是非连续的,则需要跳过孤立节点,例如:

result = [r for r in result if len(r) > 1]
bigArr = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]] ## Your list of pairs of values
molArr = []
for pair in bigArr:
    flag = False
    for molecule in molArr:
        if pair[0] in molecule or pair[1] in molecule: ## Add both values if any of them are in the molecules list
            molecule.append(pair[0])
            molecule.append(pair[1])
            flag = True ## The values have been added to an existing list

    if not flag: ## The values weren't in an existing list so add them both
        molArr.append(pair)

i = 0
for i in range(len(molArr)): ## Remove duplicates in one loop
    molArr[i] = list(set(molArr[i]))

另一种方法是:

a = [[1,2],[1,3],[1,4],[1,5],[5,6],[5,7],[5,8],[9,10],[9,11],[9,12],[9,13]]

result = []

for sub in a:
    join = False
    for i, r in enumerate(result):
        if any([x in r for x in sub]):
            join = True
            index = i
    if join:
        result[index] += [y for y in sub if y not in result[index]]
    else:
        result.append(sub)

result
#[[1,2,3,4,5,6,7,8],[9,10,11,12,13]]

相关问题 更多 >