在python中基于另一个数组上的重复值合并数组？

3条回答

网友

1楼 · 编辑于 2024-05-13 01:58:03

编辑：以上来自@Austin和@Mad Physician的解决方案更好，所以最好使用它们。我的工作是重新发明自行车，这不是Python式的方式

我认为修改原始数组是危险的，尽管这种方法使用了两倍的内存，但以这种方式进行迭代和操作是安全的。发生了什么：

在a上迭代，并在a（我们通过remove(i)排除当前值
如果没有重复项，就照常复制b和c
如果有，则在临时列表中合并，然后将其附加到a1，b1 和c1。阻塞值，这样重复的值就不会触发另一个合并。在开始时使用if可以检查值是否被阻止
返回新列表虽然我使用了np.where，但我没有使用np数组，因为它比使用列表理解快一点。请随意编辑数据格式等，我的数据格式非常简单，便于演示

import numpy as np
a = [1.0, 1.5, 1.5, 2, 2]
b = [[1, 2, 3, 4, 10], [4, 8, 10, 11, 5, 6, 12], [1, 5, 7], [70, 1, 2], [1]]
c = [[3, 4, 8], [5, 6, 12], [6, 7, 10, 123, 14], [70, 1, 2], [1, 5, 10, 4]]
def function(list1, list2, list3):
    a1 = []
    b1 = []
    c1 = []
    merged_list = []
    # to preserve original index we use enumerate
    for i, item in enumerate(list1):
        # to aboid merging twice we just exclude values from a we already checked
        if item not in merged_list:
            list_without_elem = np.array(list1)
            ixs = np.where(list_without_elem == item)[0].tolist() # removing our original index
            ixs.remove(i)
            # if empty append to new list as usual since we don't need merge
            if not ixs:
                a1.append(item)
                b1.append(list2[i])
                c1.append(list3[i])
                merged_list.append(item)
            else:
                temp1 = [*list2[i]] # temp b and c prefilled with first b and c
                temp2 = [*list3[i]]
                for ix in ixs:
                    [temp1.append(item) for item in list2[ix]]
                    [temp2.append(item) for item in list3[ix]]
                a1.append(item)
                b1.append(temp1)
                c1.append(temp2)
                merged_list.append(item)
    print(a1)
    print(b1)
    print(c1)

# example output
# [1.0, 1.5, 2]
# [[1, 2, 3, 4, 10], [4, 8, 10, 11, 5, 6, 12, 1, 5, 7], [70, 1, 2, 1]]
# [[3, 4, 8], [5, 6, 12, 6, 7, 10, 123, 14], [70, 1, 2, 1, 5, 10, 4]]

网友

2楼 · 编辑于 2024-05-13 01:58:03

由于a已排序，您可以在列表中的索引范围上使用^{}，由a键控：

from itertools import groupby

result_a = []
result_b = []
result_c = []

for _, group in groupby(range(len(a)), key=a.__getitem__):
    group = list(group)
    index = slice(group[0], group[-1] + 1)
    result_a.append(k)
    result_b.append(np.concatenate(b[index]))
    result_c.append(np.concatenate(c[index]))

group是一个迭代器，因此需要使用它来获得它所表示的实际索引。每个group包含与list_a中相同值对应的所有索引

slice(...)是在索引表达式中有:时传递给list.__getitem__的内容index相当于group[0]:group[-1] + 1]。这将切掉列表中与list_a中的每个键对应的部分

最后，np.concatenate只是将数组批量合并在一起

如果您想在不执行list(group)的情况下执行此操作，那么可以通过其他方式使用迭代器，而不必保留值。例如，您可以让groupby为您做这件事：

from itertools import groupby

result_a = []
result_b = []
result_c = []

prev = None

for _, group in groupby(range(len(a)), key=a.__getitem__):
    index = next(group)
    result_a.append(k)
    if prev is not None:
        result_b.append(np.concatenate(b[prev:index]))
        result_c.append(np.concatenate(c[prev:index]))
    prev = index

if prev is not None:
    result_b.append(np.concatenate(b[prev:]))
    result_c.append(np.concatenate(c[prev:]))

在这一点上，您甚至不需要真正使用groupby，因为自己跟踪每件事不会有太多的工作：

result_a = []
result_b = []
result_c = []

k = None

for i, n in enumerate(a):
    if n == k:
        continue
    result_a.append(n)
    if k is not None:
        result_b.append(np.concatenate(b[prev:i]))
        result_c.append(np.concatenate(c[prev:i]))
    k = n
    prev = index

if k is not None:
    result_b.append(np.concatenate(b[prev:]))
    result_c.append(np.concatenate(c[prev:]))

网友

3楼 · 编辑于 2024-05-13 01:58:03

由于a已排序，因此我将使用itertools.groupby。与@MadPhysicast的答案类似，但在zip列表上迭代：

import numpy as np
from itertools import groupby

arr = np.array

a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1, 2, 3, 4, 10]), arr([4, 8, 10, 11, 5, 6, 12]), arr([1, 5, 7]), arr([70, 1, 2]), arr([1])]
c = [arr([3, 4, 8]), arr([5, 6, 12]), arr([6, 7, 10, 123, 14]), arr([70, 1, 2]), arr([1, 5, 10, 4])]

res_a, res_b, res_c = [], [], []
for k, g in groupby(zip(a, b, c), key=lambda x: x[0]):
    g = list(g)
    res_a.append(k)
    res_b.append(np.concatenate([x[1] for x in g]))
    res_c.append(np.concatenate([x[2] for x in g]))

…输出res_a、res_b和res_c为：

[1.0, 1.5, 2]
[array([ 1,  2,  3,  4, 10]), array([ 4,  8, 10, 11,  5,  6, 12,  1,  5,  7]), array([70,  1,  2,  1])]
[array([3, 4, 8]), array([  5,   6,  12,   6,   7,  10, 123,  14]), array([70,  1,  2,  1,  5, 10,  4])]

或者，如果a未排序，您可以使用defaultdict：

import numpy as np
from collections import defaultdict

arr = np.array

a = [1.0, 1.5, 1.5, 2 , 2]
b = [arr([1, 2, 3, 4, 10]), arr([4, 8, 10, 11, 5, 6, 12]), arr([1, 5, 7]), arr([70, 1, 2]), arr([1])]
c = [arr([3, 4, 8]), arr([5, 6, 12]), arr([6, 7, 10, 123, 14]), arr([70, 1, 2]), arr([1, 5, 10, 4])]

res_a, res_b, res_c = [], [], []

d = defaultdict(list)

for x, y, z in zip(a, b, c):
    d[x].append([y, z])

for k, v in d.items():
    res_a.append(k)
    res_b.append(np.concatenate([x[0] for x in v]))
    res_c.append(np.concatenate([x[1] for x in v]))

相关问题更多 >

编程相关推荐

热门问题

热门文章