将重叠的整数对分组到较小的数组中

2024-05-14 10:21:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个Nx2数组中的数字,我想把它简化为每个重叠组的最小值和最大值,作为一个较小的Nx2数组

如果配对两侧的一个数字位于另一个配对中,则此处的组就是一个组,这将全局扩展到所有配对。在所有情况下,最终配对只包括一组直接相邻的数字

import numpy as np
x = np.array([
       [ 45,  47], #group 1
       [ 46,  47], #group 1
       [ 53,  54], #group 2
       [ 63,  66], #group 3
       [ 64,  66], #group 3
       [ 65,  66], #group 3
       [ 66,  67], #group 3
       [ 68,  70], #group 4
       [ 69,  70], #group 4
       [ 70,  71], #group 4
       [ 70,  72], #group 4
       [ 80,  81], #group 5
       [ 92,  93], #group 6
       [ 94,  95], #group 7
       [ 94,  96], #group 7
       [ 94,  97], #group 7
       [ 94,  98], #group 7
       [103, 104]]) #group 8

期望输出:

array([
    [45, 47], #g1
    [53, 54], #g2
    [63, 67], #g3
    [68, 72], #g4
    [80, 81], #g5
    [92, 93], #g6
    [94, 98], #g7
    [103, 104]]) #g8

Tags: importnumpyasnpgroup情况数字数组
2条回答

假设区域已排序

def merge_regions(regions):
    # Init the first region
    final_regions = []
    final_regions.append(regions[0])
    for i in range(1, len(regions)):
        region = regions[i]
        last_region = final_regions[-1]
        if region[0] <= last_region[1]:
            # Regions overlap, get the new end
            new_end = max(region[1], last_region[1])
            final_regions[-1] = [last_region[0], new_end]
        else:
            final_regions.append(region)
    return final_regions

输入:

[
       [ 45,  47], #group 1
       [ 46,  47], #group 1
       [ 53,  54], #group 2
       [ 63,  66], #group 3
       [ 64,  66], #group 3
       [ 65,  66], #group 3
       [ 66,  67], #group 3
       [ 68,  70], #group 4
       [ 69,  70], #group 4
       [ 70,  71], #group 4
       [ 70,  72], #group 4
       [ 80,  81], #group 5
       [ 92,  93], #group 6
       [ 94,  95], #group 7
       [ 94,  96], #group 7
       [ 94,  97], #group 7
       [ 94,  98], #group 7
       [103, 104]]

输出:

[[45, 47],
 [53, 54],
 [63, 67],
 [68, 72],
 [80, 81],
 [92, 93],
 [94, 98],
 [103, 104]]

如果可以使用pandas,则可以通过重叠间隔进行分组,并为每个组聚合新的开始值和结束值

import pandas as pd

df = pd.DataFrame(x, columns = ['start','end'])
df.groupby((~df.end.shift().ge(df.start)).cumsum()).agg({'start':'min', 'end':'max'}).to_numpy()

输出:

array([[ 45,  47],
       [ 53,  54],
       [ 63,  67],
       [ 68,  72],
       [ 80,  81],
       [ 92,  93],
       [ 94,  98],
       [103, 104]])

相关问题 更多 >

    热门问题