数字直方图:检索每个箱子的重量平方和

2024-03-29 02:36:06 发布

您现在位置:Python中文网/ 问答频道 /正文

在numpy(或scipy)中,是否可以检索直方图中每个单元的权重平方和?我想在我的直方图中有每个箱子高度的误差。对于未称重的数据,每个料仓高度的统计误差应为sqrt(N),其中N为料仓高度。。但对于加权数据,我需要加权平方和。numpy.histogram不能这样做,但是numpy或scipy中是否有其他功能可以基于不同的数组(例如,值数组,我正在进行直方图化)划分数组(例如权重数组)?我仔细看过文件,但什么也没找到。在


Tags: 文件数据功能numpy高度scipysqrt数组
2条回答

如果我可以在@obachtos的答案中添加一个补码,我已经将其扩展为一个函数,该函数演示了完整的柱状图:

def hist_bin_uncertainty(data, weights, bin_edges):
    """
    The statistical uncertainity per bin of the binned data.
    If there are weights then the uncertainity will be the root of the
    sum of the weights squared.
    If there are no weights (weights = 1) this reduces to the root of
    the number of events.

    Args:
        data: `array`, the data being histogrammed.
        weights: `array`, the associated weights of the `data`.
        bin_edges: `array`, the edges of the bins of the histogram.

    Returns:
        bin_uncertainties: `array`, the statistical uncertainity on the bins.

    Example:
    >>> x = np.array([2,9,4,8])
    >>> w = np.array([0.1,0.2,0.3,0.4])
    >>> edges = [0,5,10]
    >>> hist_bin_uncertainty(x, w, edges)
    array([ 0.31622777,  0.4472136 ])
    >>> hist_bin_uncertainty(x, None, edges)
    array([ 1.41421356,  1.41421356])
    >>> hist_bin_uncertainty(x, np.ones(len(x)), edges)
    array([ 1.41421356,  1.41421356])
    """
    import numpy as np
    # Bound the data and weights to be within the bin edges
    in_range_index = [idx for idx in range(len(data))
                      if data[idx] > min(bin_edges) and data[idx] < max(bin_edges)]
    in_range_data = np.asarray([data[idx] for idx in in_range_index])

    if weights is None or np.array_equal(weights, np.ones(len(weights))):
        # Default to weights of 1 and thus uncertainty = sqrt(N)
        in_range_weights = np.ones(len(in_range_data))
    else:
        in_range_weights = np.asarray([weights[idx] for idx in in_range_index])

    # Bin the weights with the same binning as the data
    bin_index = np.digitize(in_range_data, bin_edges)
    # N.B.: range(1, bin_edges.size) is used instead of set(bin_index) as if
    # there is a gap in the data such that a bin is skipped no index would appear
    # for it in the set
    binned_weights = np.asarray(
        [in_range_weights[np.where(bin_index == idx)[0]] for idx in range(1, len(bin_edges))])
    bin_uncertainties = np.asarray(
        [np.sqrt(np.sum(np.square(w))) for w in binned_weights])
    return bin_uncertainties

正如Alex建议的,numpy.digitize就是你想要的。该函数返回您的x数组的项属于哪个容器。然后可以使用此信息访问w的正确元素:

x = np.array([2,9,4,8])
w = np.array([0.1,0.2,0.3,0.4])

bins = np.digitize(x, [0,5,10])

# access elements for first bin
first_bin_ws = w[np.where(bins==1)[0]]

# error of fist bin
error = np.sqrt(np.sum(first_bin_ws**2.))

最后一行计算第一个箱子的误差。请注意,np.digitize从1开始计数。在

相关问题 更多 >