<p>如果您有Scipy,可以拨打<a href="http://docs.scipy.org/doc/scipy-dev/reference/generated/scipy.stats.binned_statistic.html" rel="nofollow noreferrer">scipy.stats.binned_statistic</a>:</p>
<pre><code>import scipy.stats as stats
statistic, bin_edges, binnumber = stats.binned_statistic(
x=X, values=Y, statistic='median', bins=bins)
statistic = statistic[np.isfinite(statistic)]
print(statistic)
</code></pre>
<p>收益率</p>
^{pr2}$
<hr/>
<p>如果没有SciPy,我想你需要一个列表理解。
正如您所建议的,您可以通过过滤掉那些空的容器来避免运行时警告。您可以使用列表理解中的<code>if-condition</code>来完成此操作:</p>
<pre><code>masks = [(digitized == j) for j in range(1, len(bins))]
bin_medians = [np.median(Y[mask]) for mask in masks if mask.any()]
</code></pre>
<p>还请注意,您看到的错误消息是一个警告,而不是异常。您可以(或者)使用</p>
<pre><code>import warnings
warnings.filterwarnings("ignore", 'Mean of empty slice.')
warnings.filterwarnings("ignore", 'invalid value encountered in double_scalar')
</code></pre>
<hr/>
<p>有一种快速计算u-bin的方法:</p>
<pre><code>bin_centers = []
for j in range(len(bins) - 1):
bin_centers.append((bins[j] + bins[j + 1]) / 2.)
</code></pre>
<p>可以简化为</p>
<pre><code>bin_centers = bins[:-1] + (bins[1]-bins[0])/2
</code></pre>
<hr/>
<p>比如说</p>
<pre><code>import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings("ignore", 'Mean of empty slice.')
warnings.filterwarnings("ignore", 'invalid value encountered in double_scalar')
np.random.seed(123)
X = np.random.random(10)
bins = np.linspace(min(X), max(X), 10)
digitized = np.digitize(X, bins)-1
bin_centers = bins + (bins[1]-bins[0])/2
Y = range(0, 100, 10)
Y = np.asarray(Y, dtype='float')
bin_medians = [np.median(Y[digitized == j]) for j in range(len(bins))]
print(bin_medians)
plt.scatter(bin_centers, bin_medians)
plt.show()
</code></pre>
<p>收益率</p>
<pre><code>[15.0, 90.0, 50.0, 55.0, nan, 40.0, nan, nan, nan, 60.0]
</code></pre>
<p><img src="https://i.stack.imgur.com/Tl6du.png" alt="enter image description here"/></p>
<p>如果您的目的只是绘制散点图,那么就不必删除nan,因为<code>matplotlib</code>无论如何都会忽略它们。在</p>
<p>如果你真的想移除nan,那么你可以使用</p>
<pre><code>no_nans = np.isfinite(bin_medians)
bin_medians = bin_medians[no_nans]
bin_centers = bin_centers[no_nans]
</code></pre>
<hr/>
<p>在上面,我选择使用<code>warnings.filterwarnings</code>来抑制警告。如果您不希望抑制警告,而是希望从<code>bin_medians</code>和{<cd5>}中过滤相应位置的nan,那么:</p>
<pre><code>bin_centers = bins + (bins[1]-bins[0])/2
masks = [(digitized == j) for j in range(len(bins))]
bin_centers, bin_medians = zip(*[(center, np.median(Y[mask]))
for center, mask in zip(bin_centers, masks)
if mask.any()])
</code></pre>