pythonPandasls：当符号发生变化且值最小时，如何选择行？

A value 0 105 0.662932 1 105 0.662932 2 107 0.052653 # sign changes here when A is 107 3 108 -0.228060 # among these two A 107 is closer to zero 4 110 -0.740819 5 112 -1.188906 6 142 -0.228060 # sign changes here when A is 142 7 143 0.052654 # among these two, A 143 is closer to zero 8 144 0.349638

3条回答

网友

1楼 · 编辑于 2024-05-16 02:18:13

我找到了一个简单的解决方案：

^{1}$

解决方案

^{pr2}$

结果

idx = find_closest_to_zero_idx(df.value.values)

df.loc[idx]

     A     value
2  107  0.052653
7  143  0.052654

慢而纯的熊猫法

df['value_shifted'] = df.value.shift(-1)
df['sign_changed'] = np.sign(df.value.values) * np.sign(df.value_shifted.values)

# lower index where sign changes
idx = df[df.sign_changed == -1.0].index.values

# make both lower and upper index from the a-axis negative so that
# we can groupby later.
for i in range(len(idx)):
    df.loc[ [idx[i], idx[i]+1], 'sign_changed'] = -1.0 * (i+1)

df1 = df[ np.sign(df.sign_changed) == -1.0]
df2 = df1.groupby('sign_changed')['value'].apply(lambda x: min(abs(x)))
df3 = df2.reset_index()

answer = df.merge(df3,on=['sign_changed','value'])
answer
     A     value  value_shifted  sign_changed
0  107  0.052653      -0.228060          -1.0
1  143  0.052654       0.349638          -2.0

网友

2楼 · 编辑于 2024-05-16 02:18:13

您可以使用numpy来概括该方法：

^{1}$

^{pr2}$

这个解决方案的效率也会大大提高，特别是随着尺寸的扩大。在

In [122]: df = pd.concat([df]*100)

In [123]: %timeit chris(df)
870 µs ± 10 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [124]: %timeit nathan(df)
2.03 s ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [125]: %timeit df.loc[find_closest_to_zero_idx(df.value.values)]
1.81 ms ± 12.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

网友

3楼 · 编辑于 2024-05-16 02:18:13

^{1}$ ^{pr2}$

感谢@Vince W.提到应该使用np.where；我最初使用的是更复杂的方法。在

编辑-请参阅@user3483203下面的答案，它比这个快得多。通过在numpy数组（而不是pandas系列）上执行前几个操作（diff、abs、compare equality），您甚至可以提高一点（当我重新运行它们的计时时，速度是原来的2倍）。numpy的diff与pandas中的不同，因为它删除了第一个元素，而不是返回NaN。这意味着我们将获取符号更改的第一行的索引，而不是第二行的索引，并且需要添加一个来获得下一行。在

def find_min_sign_changes(df):
    vals = df.value.values
    abs_sign_diff = np.abs(np.diff(np.sign(vals)))
    # idx of first row where the change is
    change_idx = np.flatnonzero(abs_sign_diff == 2)
    # +1 to get idx of second rows in the sign change too
    change_idx = np.stack((change_idx, change_idx + 1), axis=1)

    # now we have the locations where sign changes occur. We just need to extract
    # the `value` values at those locations to determine which of the two possibilities
    # to choose for each sign change (whichever has `value` closer to 0)

    min_idx = np.abs(vals[change_idx]).argmin(1)
    return df.iloc[change_idx[range(len(change_idx)), min_idx]]

相关问题更多 >

编程相关推荐

热门问题

热门文章