涉及数据帧的操作的性能问题

2024-03-28 09:44:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个熊猫数据帧包含OHLC1MN数据(19724行)。我正在考虑添加两个新的列来跟踪过去3天的最低价格和最高价格(包括今天到当前有问题的价格栏,忽略缺少的天数)。但是我遇到了性能问题,因为for循环的%timeit表示57秒。。。我正在寻找加速的方法(矢量化?我试过,但我有点挣扎(我必须承认)

#Import the data and put them in a DataFrame. The DataFrame should contain
#the following fields: DateTime (the index), Open, Close, High, Low, Volume.

#----------------------
#The following assume the first column of the file is Datetime
dfData=pd.read_csv(os.path.join(DataLocation,FileName),index_col='Date')
dfData.index=pd.to_datetime(dfData.index,dayfirst=True)
dfData.index.tz_localize('Singapore')

# Calculate the list of unique dates in the dataframe to find T-2
ListOfDates=pd.to_datetime(dfData.index.date).unique()

#Add a ExtMin and and ExtMax to the dataFrame to keep track of the min and max over a certain window
dfData['ExtMin']=np.nan
dfData['ExtMax']=np.nan

#For each line in the dataframe, calculate the minimum price reached over the past 3 days including today.

def addMaxMin(dfData):
    for index,row in dfData.iterrows():
        #Find the index in ListOfDates, strip out the time, offset by -2 rows
        Start=ListOfDates[max(0,ListOfDates.get_loc(index.date())-2)]
        #Populate the ExtMin and ExtMax columns
        dfData.ix[index,'ExtMin']=dfData[(Start<=dfData.index) & (dfData.index<index)]['LOW'].min()
        dfData.ix[index,'ExtMax']=dfData[(Start<=dfData.index) & (dfData.index<index)]['HIGH'].max()
    return dfData

%timeit addMaxMin(dfData)

谢谢


Tags: andoftheto数据inindex价格