Pandas优化数据帧多条件反向嵌套循环

def slowFunctionToOptimize(): # Variables definition minVolume = 2000 exchange1 = 'binance' exchange2 = 'bitmart' volEx1Str = 'volume_' + exchange1 volEx2Str = 'volume_' + exchange2 threshold = 15.0 minDuration = 10.0 # See below for an example dataset dataset = pd.read_csv('example.csv', sep='|') indicesLst = dataset.index.values minIndexLst = indicesLst[0] # Get all indices that exceed or are equal to the specified threshold, # and normalize with the first index value to work with "iloc" later on indicesThresh = dataset.index[dataset.diffprice >= threshold].values - minIndexLst pv = None prevEndIndex = len(dataset) # Get the largest possible amount of rows (sequential order) based on the volume mean # of the two exchanges, where the first value exceed or is equal to the threshold for startInd in indicesThresh: for endInd in range(prevEndIndex, 0, -1): if endInd - startInd < minDuration: break dfTmp = dataset.iloc[startInd:endInd, :] avgVolume1 = dfTmp[volEx1Str].mean() avgVolume2 = dfTmp[volEx2Str].mean() if avgVolume1 > minVolume and avgVolume2 > minVolume: # Get the final result. pv = dfTmp.copy() break # Largest amount of rows found, exiting if pv is not None: break prevEndIndex = startInd if pv is None: print('No combination could be found for this iteration.') return return pv

1条回答

网友

1楼 · 发布于 2024-04-27 03:55:50

要优化此代码，可以执行以下几项操作：

通过为bitmart和binance添加“累积体积”列，可以更有效地计算平均值

^{tb1}$

然后，平均体积就是dataset['cumulative volume'][startInd] - dataset['cumulative volume'][endInd]

只更新所需的数据：复制数据帧效率很低，因此应避免一直更新dfTmp。只需跟踪startInd和endInd并使用前面的技巧计算平均体积

您可能还可以使用其他一些技巧，但如果不知道您正在使用的数据的确切类型，我想我无法为您提供更多帮助

相关问题更多 >

编程相关推荐

热门问题

热门文章