获取分割NumPy数组的索引

ranges = np.zeros((n_chunks, 2), np.int64) ranges_idx = 0 range_start_idx = start sum = 0 for i in range(x.shape[0]): sum += x[i] if sum > x.sum() / n_chunks: ranges[ranges_idx, 0] = range_start_idx ranges[ranges_idx, 1] = min( i + 1, x.shape[0] ) # Exclusive stop index # Reset and Update range_start_idx = i + 1 ranges_idx += 1 sum = 0 # Handle final range outside of for loop ranges[ranges_idx, 0] = range_start_idx ranges[ranges_idx, 1] = x.shape[0] if ranges_idx < n_chunks - 1: left[ranges_idx:] = x.shape[0] return ranges

2条回答

网友

1楼 · 编辑于 2024-04-27 05:10:17

以下是一个不会迭代所有元素的解决方案：

def fun2(array, n):
    min_sum = np.sum(array) / n
    cumsum = np.cumsum(array)
    i = -1
    count = min_sum
    out = []
    while i < len(array)-1:
        j = np.searchsorted(cumsum, count) 
        out.append([i+1, j+1])
        i = j 
        if i < len(array):
            count = cumsum[i] + min_sum
    out[-1][1] -= 1
    return np.array(out)

对于这两个测试用例，它会产生您期望的结果。嗯

网友

2楼 · 编辑于 2024-04-27 05:10:17

我从一个similar question that was answered中找到灵感：

def func(x, n):
    out = np.zeros((n, 2), np.int64)
    cum_arr = x.cumsum() / x.sum()
    idx = 1 + np.searchsorted(cum_arr, np.linspace(0, 1, n, endpoint=False)[1:])
    out[1:, 0] = idx  # Fill the first column with start indices
    out[:-1, 1] = idx  # Fill the second column with exclusive stop indices
    out[-1, 1] = x.shape[0]  # Handle the stop index for the final chunk
    return out

更新

为了涵盖病理病例，我们需要更精确一点，并采取如下措施：

def func(x, n, truncate=False):
    out = np.zeros((n_chunks, 2), np.int64)
    cum_arr = x.cumsum() / x.sum()
    idx = 1 + np.searchsorted(cum_arr, np.linspace(0, 1, n, endpoint=False)[1:])
    out[1:, 0] = idx  # Fill the first column with start indices
    out[:-1, 1] = idx  # Fill the second column with exclusive stop indices
    out[-1, 1] = x.shape[0]  # Handle the stop index for the final chunk

    # Handle pathological case
    diff_idx = np.diff(idx)
    if np.any(diff_idx == 0):
        row_truncation_idx = np.argmin(diff_idx) + 2
        out[row_truncation_idx:, 0] = x.shape[0]
        out[row_truncation_idx-1:, 1] = x.shape[0]
        if truncate:
            out = out[:row_truncation_idx]

    return out

相关问题更多 >

编程相关推荐

热门问题

热门文章