如何基于上一行信息更新数据帧

2024-05-15 23:03:32 发布

您现在位置:Python中文网/ 问答频道 /正文

熊猫中有以下数据帧,我想检查HH值是否大于前一行的高值,如果大于,则更新前一行的HH值,并用非值替换当前的HH

How to check if the value of HH > High of the previous row and update as per above procedure ?

enter image description here

请注意,我不想移动列中的所有数据(因此我认为使用shift不是解决方案),我只想基于前一行的“高”数据更改一个特定的数据段

关于节目:

我正在尝试创建一个程序来查找指定金融市场的极小值和极大值,我正在使用“peakdetect”库https://pypi.org/project/peakdetect/

它只生成一个二维最小值和最大值列表:

density = 2
# Temp ref to the array of minima and maxima
high_arr = peakdetect(y_axis = 
clean_dataframe['High'],x_axis=clean_dataframe.index,lookahead=density)
low_arr = peakdetect(y_axis = 
clean_dataframe['Low'],x_axis=clean_dataframe.index,lookahead=density)

# first index is always for maxima
_hh = pd.DataFrame(high_arr[0])
_hh = _hh.rename(columns={0:'Index',1:'HH'})

# second index is always for minima
_ll = pd.DataFrame(low_arr[1])
_ll = _ll.rename(columns={0:'Index',1:'LL'})

# join all minima and maxima to the

full_df=
clean_dataframe.join(_hh.set_index('Index')).join(_ll.set_index('Index'))

'''

清除数据帧结果:

问题是一些LL(Valley)不准确,有时前一行的低价是正确的LL,因此我必须测量并更改图片中提到的LL行


Tags: oftheto数据cleandataframeindexhh
1条回答
网友
1楼 · 发布于 2024-05-15 23:03:32

为了帮助您了解班次(-1)的工作原理,请查看以下解决方案。我看了一下图像,创建了原始数据帧

import pandas as pd
import numpy as np
df = pd.DataFrame({'Dates':['2021-02-04 19:00:00','2021-02-04 20:00:00',
                            '2021-02-04 21:00:00','2021-02-04 22:00:00',
                            '2021-02-04 23:00:00','2021-02-05 00:00:00',
                            '2021-02-05 01:00:00','2021-02-05 02:00:00'],
                   'Close':[1.19661,1.19660,1.19611,1.19643,1.19664,
                            1.19692,1.19662,1.19542],
                   'High' :[1.19679,1.19678,1.19680,1.19679,1.19688,
                            1.19721,1.19694,1.19682],
                   'Low'  :[1.19577,1.19637,1.19604,1.19590,1.19632,
                            1.19634,1.19622,1.19537],
                   'Open' :[1.19630,1.19662,1.19665,1.19613,1.19646,
                            1.19662,1.19690,1.19665],
                   'Status':['ok']*8,
                   'Volume':[2579,1858,1399,788,1437,2435,2898,2641],
                   'HH'   :[np.NaN]*5+[1.19721]+[np.NaN]*2,
                   'LL'   :[np.NaN]*8})
print (df)

#make a copy of df['High'] into df'NewHigh']
df['NewHigh'] = df['High']

#if next row in 'HH' is greater than 'High', then update 'NewHigh' with next row from 'HH'
df.loc[df['HH'].shift(-1) > df['High'],'NewHigh'] = df['HH'].shift(-1)

print (df[['Dates','High','HH','NewHigh']])

其输出将为:

                 Dates     High       HH  NewHigh
0  2021-02-04 19:00:00  1.19679      NaN  1.19679
1  2021-02-04 20:00:00  1.19678      NaN  1.19678
2  2021-02-04 21:00:00  1.19680      NaN  1.19680
3  2021-02-04 22:00:00  1.19679      NaN  1.19679
4  2021-02-04 23:00:00  1.19688      NaN  1.19721 # <- This got updated
5  2021-02-05 00:00:00  1.19721  1.19721  1.19721
6  2021-02-05 01:00:00  1.19694      NaN  1.19694
7  2021-02-05 02:00:00  1.19682      NaN  1.19682

注意:我创建了一个新列来显示更改。您可以直接更新High。您可以给出“High”,而不是df.loc行上的'NewHigh'。这应该能奏效

相关问题 更多 >