擅长:python、mysql、java
^{1}$
^{pr2}$
<p>感谢@Vince W.提到应该使用<code>np.where</code>;我最初使用的是更复杂的方法。在</p>
<p>编辑-请参阅@user3483203下面的答案,它比这个快得多。通过在numpy数组(而不是pandas系列)上执行前几个操作(diff、abs、compare equality),您甚至可以提高一点(当我重新运行它们的计时时,速度是原来的2倍)。numpy的<code>diff</code>与pandas中的不同,因为它删除了第一个元素,而不是返回<code>NaN</code>。这意味着我们将获取符号更改的第一行的索引,而不是第二行的索引,并且需要添加一个来获得下一行。在</p>
<pre class="lang-py prettyprint-override"><code>def find_min_sign_changes(df):
vals = df.value.values
abs_sign_diff = np.abs(np.diff(np.sign(vals)))
# idx of first row where the change is
change_idx = np.flatnonzero(abs_sign_diff == 2)
# +1 to get idx of second rows in the sign change too
change_idx = np.stack((change_idx, change_idx + 1), axis=1)
# now we have the locations where sign changes occur. We just need to extract
# the `value` values at those locations to determine which of the two possibilities
# to choose for each sign change (whichever has `value` closer to 0)
min_idx = np.abs(vals[change_idx]).argmin(1)
return df.iloc[change_idx[range(len(change_idx)), min_idx]]
</code></pre>