pythonPandasls:当符号发生变化且值最小时,如何选择行?

2024-05-16 02:18:13 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图找出函数在什么地方穿过x=0线。我利用了这样一个事实:当函数穿过x轴时,它的符号会发生变化。在

现在,我有一个像这样的数据帧,我想找到最接近0的两行,假设函数在两个点上穿过x轴。在

     A     value
0  105  0.662932
1  105  0.662932
2  107  0.052653 # sign changes here when A is 107
3  108 -0.228060 # among these two A 107 is closer to zero
4  110 -0.740819
5  112 -1.188906
6  142 -0.228060 # sign changes here when A is 142
7  143  0.052654 # among these two, A 143 is closer to zero
8  144  0.349638

所需输出:

^{pr2}$

Tags: to函数利用hereis地方事实when
3条回答

我找到了一个简单的解决方案:

^{1}$

解决方案

^{pr2}$

结果

idx = find_closest_to_zero_idx(df.value.values)

df.loc[idx]

     A     value
2  107  0.052653
7  143  0.052654

慢而纯的熊猫法

df['value_shifted'] = df.value.shift(-1)
df['sign_changed'] = np.sign(df.value.values) * np.sign(df.value_shifted.values)

# lower index where sign changes
idx = df[df.sign_changed == -1.0].index.values

# make both lower and upper index from the a-axis negative so that
# we can groupby later.
for i in range(len(idx)):
    df.loc[ [idx[i], idx[i]+1], 'sign_changed'] = -1.0 * (i+1)

df1 = df[ np.sign(df.sign_changed) == -1.0]
df2 = df1.groupby('sign_changed')['value'].apply(lambda x: min(abs(x)))
df3 = df2.reset_index()

answer = df.merge(df3,on=['sign_changed','value'])
answer
     A     value  value_shifted  sign_changed
0  107  0.052653      -0.228060          -1.0
1  143  0.052654       0.349638          -2.0

您可以使用numpy来概括该方法:

^{1}$

^{pr2}$


这个解决方案的效率也会大大提高,特别是随着尺寸的扩大。在

In [122]: df = pd.concat([df]*100)

In [123]: %timeit chris(df)
870 µs ± 10 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [124]: %timeit nathan(df)
2.03 s ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [125]: %timeit df.loc[find_closest_to_zero_idx(df.value.values)]
1.81 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
^{1}$ ^{pr2}$

感谢@Vince W.提到应该使用np.where;我最初使用的是更复杂的方法。在

编辑-请参阅@user3483203下面的答案,它比这个快得多。通过在numpy数组(而不是pandas系列)上执行前几个操作(diff、abs、compare equality),您甚至可以提高一点(当我重新运行它们的计时时,速度是原来的2倍)。numpy的diff与pandas中的不同,因为它删除了第一个元素,而不是返回NaN。这意味着我们将获取符号更改的第一行的索引,而不是第二行的索引,并且需要添加一个来获得下一行。在

def find_min_sign_changes(df):
    vals = df.value.values
    abs_sign_diff = np.abs(np.diff(np.sign(vals)))
    # idx of first row where the change is
    change_idx = np.flatnonzero(abs_sign_diff == 2)
    # +1 to get idx of second rows in the sign change too
    change_idx = np.stack((change_idx, change_idx + 1), axis=1)

    # now we have the locations where sign changes occur. We just need to extract
    # the `value` values at those locations to determine which of the two possibilities
    # to choose for each sign change (whichever has `value` closer to 0)

    min_idx = np.abs(vals[change_idx]).argmin(1)
    return df.iloc[change_idx[range(len(change_idx)), min_idx]]

相关问题 更多 >