在 pandas 数据框中识别股票报价的价格波动/趋势
我有一个pandas的数据框,里面有时间索引和股票的开高低收价格列。我想提取一些价格波动或趋势,这些波动需要满足一定的条件:上涨的波动要大于0.3美元,而下跌的波动要小于-0.3美元。
df[:10]
close high low open volume
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700
我查阅了pandas的文档,发现使用Dataframe.apply()可能是个好方法,但我在写函数的时候遇到了困难。因为我的编程能力有限,所以需要一点帮助。
global row_nr
row_nr = 1
def extract_swings()
if row_nr == 1 : pivot = row.open ; row_nr += 1
else : if (row.high-pivot) >= 0.3 : ????
... ????
df['swings'] = df.apply(extract_swings, axis=1)
最终的结果应该是这样的:
df['swings'][:10]
2014-05-09 09:30:00-04:00 NaN
2014-05-09 09:31:00-04:00 NaN
2014-05-09 09:32:00-04:00 -0.35
2014-05-09 09:33:00-04:00 NaN
2014-05-09 09:34:00-04:00 NaN
2014-05-09 09:35:00-04:00 0.36
2014-05-09 09:36:00-04:00 NaN
2014-05-09 09:37:00-04:00 NaN
2014-05-09 09:38:00-04:00 NaN
2014-05-09 09:39:00-04:00 -0.59
更新:为了避免任何混淆,这里是请求的函数应该如何处理数据框:
close high low open volume
2014-05-09 09:30:00-04:00 187.5600 187.73 187.54 187.700 1922600
# this is the first line, first minute and we well take row.open 187.70 as \
# the starting point or first pivot
2014-05-09 09:31:00-04:00 187.4900 187.56 187.42 187.550 534400
# next minute we check if either (row.high - pivot) >= 0.3 or \
# (row.low-pivot) <= -0.3. Neither is true so nothing to do here.
2014-05-09 09:32:00-04:00 187.4200 187.51 187.35 187.490 224800
# next minute same check ... we see that row.low-pivot = -0.35. \
# We consider 187.35 a second pivot and the diff value -0.35 a first trend down
2014-05-09 09:33:00-04:00 187.5500 187.58 187.39 187.400 303700
# next minute we check if the identified trend/swing down goes further \
# down by having a row.low lower than previous row.low. If we would \
# have found here a new lower row.low that would be the second pivot \
# and we would forget about 187.35 as being a pivot ... and so on. \
# We don't see that on this row, instead we see prices are higher than \
# previous row, so we start checking the diff for a potential up trend \
# starting from second pivot 187.35. As long as we do not encounter a \
# higher high with over 0.3 above last pivot we are still within the identified down trend.
2014-05-09 09:34:00-04:00 187.6700 187.67 187.53 187.560 438200
# we don't see a lower low to reconsider the second pivot neither \
# a (row.high- second_pivot) >= 0.3
2014-05-09 09:35:00-04:00 187.6000 187.71 187.56 187.680 296400
# here we see (row.high- second_pivot) = 0.36. We consider 187.71 as \
# a third_pivot and the diff value 0.36 as an up trend (from second pivot to here)
2014-05-09 09:36:00-04:00 187.4100 187.67 187.38 187.600 329900
# next minute we check if the identified trend/swing up goes further up \
# by having a row.high higher than third pivot. If we would have found here \
# a new higher row.high that would be the third pivot and we would forget \
# about 187.71 as being a pivot ... and so on. We don't see that on this row,\
# instead we see prices are lower than previous row, so we start \
# checking the diff for a potential down trend starting from third \
# pivot 187.71. As long as we do not encounter a lower low with \
# over 0.3 below last pivot we are still within the identified up trend.
2014-05-09 09:37:00-04:00 187.3100 187.44 187.28 187.400 404000
# we find here a (row.low - third_pivot) = 0.43 so we have identified \
# a new down trend starting from third pivot and now we have a potential\
# fourth pivot 187.28
2014-05-09 09:38:00-04:00 187.2600 187.37 187.26 187.300 912800
# we find here a lower low so we don't consider 187.28 the fourth \
# pivot anymore but this lower low 187.26
2014-05-09 09:39:00-04:00 187.2200 187.28 187.12 187.250 607700
# we find here a lower low so we don't consider 187.26 the fourth pivot anymore \
# but this lower low 187.12. Being this the lowest low we consider this one \
# to be the fourth pivot and the diff 187.12-187.71=-0.59 as a downtrend with that value
5 个回答
0
我更新了@Pawel-Kozela的回答,使其可以与最新版本的pandas兼容,并添加了一种简单的方法来传递列名。
def get_pivots(df, cols=['O','H','L', 'C']):
df['swings'] = np.nan
df.loc[df.index[0], 'swings'] = df.loc[df.index[0], cols[0]]
df.loc[df.index[-1], 'swings'] = df.loc[df.index[-1], cols[0]]
pivot = df.loc[df.index[0], cols[0]]
df.loc[df.index[0], ]
last_pivot_id = 0
up_down = 0
diff = .3
for i, row in df.iterrows():
# We don't have a trend yet
if up_down == 0:
if row[cols[2]] < pivot - diff:
df.loc[i, 'swings'] = row[cols[2]] - pivot
pivot, last_pivot_id = row[cols[2]], i
up_down = -1
elif row[cols[1]] > pivot + diff:
df.loc[i, 'swings'] = row[cols[1]] - pivot
pivot, last_pivot_id = row[cols[1]], i
up_down = 1
# Current trend is up
elif up_down == 1:
# If got higher than last pivot, update the swing
if row[cols[1]] > pivot:
# Remove the last pivot, as it wasn't a real one
df.loc[i, 'swings'] = df.loc[i, 'swings']
df.loc[last_pivot_id, 'swings'] = np.nan
pivot, last_pivot_id = row[cols[1]], i
elif row[cols[2]] < pivot - diff:
df.loc[i, 'swings'] = row[cols[2]] - pivot
pivot, last_pivot_id = row[cols[2]], i
# Change the trend indicator
up_down = -1
0
假设你现在只关心最高值,那我们可以这样做:
startPx = df.open.iloc[0]
level = ((df.high - startPx) / .3).astype(int)
df['swings'] = level - level.shift(1)
接下来,如果你想知道它们之间的差异,你只需要做类似这样的操作:
changes = df[df.swings != 0]
diffs = changes.high - changes.open.shift(1)
0
我还没有测试过这个,不过类似这样的代码应该能帮你实现你想要的效果。如果在同一分钟内,low < pivot - diff
和 high > pivot + diff
同时成立,会发生什么呢?
def f(df):
pivot = df.open.iloc[0]
diff = .3
def proc(ser):
res = np.nan
if ser.low < pivot - diff:
res, pivot = ser.low - pivot, ser.low
elif ser.high > pivot + diff:
res, pivot = ser.high - pivot, ser.high
return res
df['swings'] = df.apply(proc, axis=1)
1
更新了tw0000的代码,因为他在使用'O'的那几行代码上有个小错误,应该用cols[0]。
def get_pivots(df, cols=['O','H','L', 'C']):
df['swings'] = np.nan
df.loc[df.index[0], 'swings'] = df.loc[df.index[0], cols[0]]
df.loc[df.index[-1], 'swings'] = df.loc[df.index[-1], cols[0]]
pivot = df.loc[df.index[0], cols[0]]
df.loc[df.index[0], ]
last_pivot_id = 0
up_down = 0
diff = .3
for i, row in df.iterrows():
# We don't have a trend yet
if up_down == 0:
if row[cols[2]] < pivot - diff:
df.loc[i, 'swings'] = row[cols[2]] - pivot
pivot, last_pivot_id = row[cols[2]], i
up_down = -1
elif row[cols[1]] > pivot + diff:
df.loc[i, 'swings'] = row[cols[1]] - pivot
pivot, last_pivot_id = row[cols[1]], i
up_down = 1
# Current trend is up
elif up_down == 1:
# If got higher than last pivot, update the swing
if row[cols[1]] > pivot:
# Remove the last pivot, as it wasn't a real one
df.loc[i, 'swings'] = df.loc[i, 'swings']
df.loc[last_pivot_id, 'swings'] = np.nan
pivot, last_pivot_id = row[cols[1]], i
elif row[cols[2]] < pivot - diff:
df.loc[i, 'swings'] = row[cols[2]] - pivot
pivot, last_pivot_id = row[cols[2]], i
# Change the trend indicator
up_down = -1
7
这有点复杂,因为你不能在找到下一个可能的支点之前就把一个点标记为支点。比如说,如果你正在观察一个上升的趋势,你不能说这个趋势结束了,直到你找到一个足够低的低点。
这段代码可以解决这个问题——我把你的数据放在了tmpData.txt文件里,方便你使用,并得到了想要的结果。请查看一下。
def get_pivots():
data = pd.DataFrame.from_csv('tmpData.txt')
data['swings'] = np.nan
pivot = data.irow(0).open
last_pivot_id = 0
up_down = 0
diff = .3
for i in range(0, len(data)):
row = data.irow(i)
# We don't have a trend yet
if up_down == 0:
if row.low < pivot - diff:
data.ix[i, 'swings'] = row.low - pivot
pivot, last_pivot_id = row.low, i
up_down = -1
elif row.high > pivot + diff:
data.ix[i, 'swings'] = row.high - pivot
pivot, last_pivot_id = row.high, i
up_down = 1
# Current trend is up
elif up_down == 1:
# If got higher than last pivot, update the swing
if row.high > pivot:
# Remove the last pivot, as it wasn't a real one
data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.high - data.ix[last_pivot_id, 'high'])
data.ix[last_pivot_id, 'swings'] = np.nan
pivot, last_pivot_id = row.high, i
elif row.low < pivot - diff:
data.ix[i, 'swings'] = row.low - pivot
pivot, last_pivot_id = row.low, i
# Change the trend indicator
up_down = -1
# Current trend is down
elif up_down == -1:
# If got lower than last pivot, update the swing
if row.low < pivot:
# Remove the last pivot, as it wasn't a real one
data.ix[i, 'swings'] = data.ix[last_pivot_id, 'swings'] + (row.low - data.ix[last_pivot_id, 'low'])
data.ix[last_pivot_id, 'swings'] = np.nan
pivot, last_pivot_id = row.low, i
elif row.high > pivot - diff:
data.ix[i, 'swings'] = row.high - pivot
pivot, last_pivot_id = row.high, i
# Change the trend indicator
up_down = 1
print data
输出结果:
date close high low open volume swings
2014-05-09 13:30:00 187.56 187.73 187.54 187.70 1922600 NaN
2014-05-09 13:31:00 187.49 187.56 187.42 187.55 534400 NaN
2014-05-09 13:32:00 187.42 187.51 187.35 187.49 224800 -0.35
2014-05-09 13:33:00 187.55 187.58 187.39 187.40 303700 NaN
2014-05-09 13:34:00 187.67 187.67 187.53 187.56 438200 NaN
2014-05-09 13:35:00 187.60 187.71 187.56 187.68 296400 0.36
2014-05-09 13:36:00 187.41 187.67 187.38 187.60 329900 NaN
2014-05-09 13:37:00 187.31 187.44 187.28 187.40 404000 NaN
2014-05-09 13:38:00 187.26 187.37 187.26 187.30 912800 NaN
2014-05-09 13:39:00 187.22 187.28 187.12 187.25 607700 -0.59