所以我写了一段代码,效果很好。但是,它太慢了,因为我经常多次运行这段代码。我想用矢量化操作来优化它,但我很难找到这样做的方法,因为我还不是熊猫方面的绝对专家
def slowFunctionToOptimize():
# Variables definition
minVolume = 2000
exchange1 = 'binance'
exchange2 = 'bitmart'
volEx1Str = 'volume_' + exchange1
volEx2Str = 'volume_' + exchange2
threshold = 15.0
minDuration = 10.0
# See below for an example dataset
dataset = pd.read_csv('example.csv', sep='|')
indicesLst = dataset.index.values
minIndexLst = indicesLst[0]
# Get all indices that exceed or are equal to the specified threshold,
# and normalize with the first index value to work with "iloc" later on
indicesThresh = dataset.index[dataset.diffprice >= threshold].values - minIndexLst
pv = None
prevEndIndex = len(dataset)
# Get the largest possible amount of rows (sequential order) based on the volume mean
# of the two exchanges, where the first value exceed or is equal to the threshold
for startInd in indicesThresh:
for endInd in range(prevEndIndex, 0, -1):
if endInd - startInd < minDuration:
break
dfTmp = dataset.iloc[startInd:endInd, :]
avgVolume1 = dfTmp[volEx1Str].mean()
avgVolume2 = dfTmp[volEx2Str].mean()
if avgVolume1 > minVolume and avgVolume2 > minVolume:
# Get the final result.
pv = dfTmp.copy()
break
# Largest amount of rows found, exiting
if pv is not None:
break
prevEndIndex = startInd
if pv is None:
print('No combination could be found for this iteration.')
return
return pv
以下是“example.csv”数据集:
以下是预期输出(函数中的返回变量“pv”):
要优化此代码,可以执行以下几项操作:
然后,平均体积就是
dataset['cumulative volume'][startInd] - dataset['cumulative volume'][endInd]
dfTmp
。只需跟踪startInd
和endInd
并使用前面的技巧计算平均体积您可能还可以使用其他一些技巧,但如果不知道您正在使用的数据的确切类型,我想我无法为您提供更多帮助
相关问题 更多 >
编程相关推荐