我是一个python新手,所以我希望我的两个问题是明确和完整的。我在下面以csv格式发布了实际代码和测试数据集。
我已经能够构造以下代码(主要是在StackOverflow贡献者的帮助下)来使用Newton-Raphson方法计算期权合约的隐含波动率。该过程在确定隐含波动率时计算Vega。尽管我可以使用Pandas DataFrame apply方法为隐含波动率创建一个新的DataFrame列,但我无法为Vega创建第二个列。当函数同时返回IV&Vega时,是否可以创建两个单独的数据帧列?
我试过:
return iv, vega
来自函数df[['myIV', 'Vega']] = df.apply(newtonRap, axis=1)
ValueError: Shape of passed values is (56, 2), indices imply (56, 13)
也尝试过:
return iv, vega
来自函数df['myIV'], df['Vega'] = df.apply(newtonRap, axis=1)
ValueError: Shape of passed values is (56, 2), indices imply (56, 13)
另外,计算过程很慢。我导入了numba并实现了@jit(nogil=True)装饰器,但我只看到性能提高了25%。测试数据集是性能测试有近90万条记录。运行时间为2小时9分钟,无numba或有numba,但witout nogil=True。使用numba和@jit(nogil=True)时的运行时间为1小时32分钟。我能做得更好吗?
from datetime import datetime
from math import sqrt, pi, log, exp, isnan
from scipy.stats import norm
from numba import jit
# dff = Daily Fed Funds (Posted rate is usually one day behind)
dff = pd.read_csv('https://research.stlouisfed.org/fred2/data/DFF.csv', parse_dates=[0], index_col='DATE')
rf = float('%.4f' % (dff['VALUE'][-1:][0] / 100))
# rf = .0015 # Get Fed Funds Rate https://research.stlouisfed.org/fred2/data/DFF.csv
tradingMinutesDay = 450 # 7.5 hours per day * 60 minutes per hour
tradingMinutesAnnum = 113400 # trading minutes per day * 252 trading days per year
cal = USFederalHolidayCalendar() # Load US Federal holiday calendar
@jit(nogil=True) # nogil=True arg improves performance by 25%
def newtonRap(row):
"""Estimate Implied Volatility (IV) using Newton-Raphson method
:param row (dataframe): Options contract params for function
TimeStamp (datetime): Close date
Expiry (datetime): Option contract expiration date
Strike (float): Option strike
OptType (object): 'C' for call; 'P' for put
RootPrice (float): Underlying close price
Bid (float): Option contact closing bid
Ask (float): Option contact closing ask
:return:
float: Estimated implied volatility
"""
if row['Bid'] == 0.0 or row['Ask'] == 0.0 or row['RootPrice'] == 0.0 or row['Strike'] == 0.0 or \
row['TimeStamp'] == row['Expiry']:
iv, vega = 0.0, 0.0 # Set iv and vega to zero if option contract is invalid or expired
else:
# dte (Days to expiration) uses pandas bdate_range method to determine the number of business days to expiration
# minus USFederalHolidays minus constant of 1 for the TimeStamp date
dte = float(len(pd.bdate_range(row['TimeStamp'], row['Expiry'])) -
len(cal.holidays(row['TimeStamp'], row['Expiry']).to_pydatetime()) - 1)
mark = (row['Bid'] + row['Ask']) / 2
cp = 1 if row['OptType'] == 'C' else -1
S = row['RootPrice']
K = row['Strike']
# T = the number of trading minutes to expiration divided by the number of trading minutes in year
T = (dte * tradingMinutesDay) / tradingMinutesAnnum
# TODO get dividend value
d = 0.00
iv = sqrt(2 * pi / T) * mark / S # Closed form estimate of IV Brenner and Subrahmanyam (1988)
vega = 0.0
for i in range(1, 100):
d1 = (log(S / K) + T * (rf - d + iv ** 2 / 2)) / (iv * sqrt(T))
d2 = d1 - iv * sqrt(T)
vega = S * norm.pdf(d1) * sqrt(T)
model = cp * S * norm.cdf(cp * d1) - cp * K * exp(-rf * T) * norm.cdf(cp * d2)
iv -= (model - mark) / vega
if abs(model - mark) < 1.0e-9:
break
if isnan(iv) or isnan(vega):
iv, vega = 0.0, 0.0
# TODO Return vega with iv if add'l pandas column possible
# return iv, vega
return iv
if __name__ == "__main__":
# test function from baseline data
get_csv = True
if get_csv:
csvHeaderList = ['TimeStamp', 'OpraSymbol', 'RootSymbol', 'Expiry', 'Strike', 'OptType', 'RootPrice', 'Last',
'Bid', 'Ask', 'Volume', 'OpenInt', 'IV']
fileName = 'C:/tmp/test-20150930-56records.csv'
df = pd.read_csv(fileName, parse_dates=[0, 3], names=csvHeaderList)
else:
pass
start = datetime.now()
# TODO Create add'l pandas dataframe column, if possible, for vega
# df[['myIV', 'Vega']] = df.apply(newtonRap, axis=1)
# df['myIV'], df['Vega'] = df.apply(newtonRap, axis=1)
df['myIV'] = df.apply(newtonRap, axis=1)
end = datetime.now()
print end - start
测试数据:C/tmp/Test-20150930-56records.csv
2015-09-30 16:00:00,AAPL151016C00109000,AAPL,2015-10-16 16 16:00:00109,C,109.95,3.46,3.6,3.715651290,0.3497 2015-09-30 16:00:00,aapl151016p0019000,AAPL,2015-10-16 16 16:00:00109,电话:109.95,2.4,2.34,2.4237903087,0.3146 2015-09-30 16:00:00,AAPL151016C0011000,AAPL,2015-10-16 16 16:00:00110,C,109.95,3,2.86,31021728850,0.3288 2015-09-30 16:00:00,AAPL151016P00110000,AAPL,2015-10-16 16 16:00:00110,电话:109.95,2.81,2.74,2.81211344427,0.3029 2015-09-30 16:00:00,AAPL151016C00111000,AAPL,2015-10-16 16 16:00:00111,C,109.95,2.35,2.44,2.4566742318,0.3187 2015-09-30 16:00:00,AAPL151016P00111000,AAPL,2015-10-16 16 16:00:00111,电话:109.95,3.2,3.1,3.2520313773,0.2926 2015-09-30 16:00:00,AAPL151120C0011000,AAPL,2015-11-20 16:00:00110,C,109.95,5.9,5.7,5.95533017112,0.3635 2015-09-30 16:00:00,AAPL151120P00110000,AAPL,2015-11-20 16:00:00110,电话:109.95,6.15,6.1,6.3372415704,0.3842
就numba的性能而言,numba对pandas数据帧一无所知,无法将其上的操作编译成快速的机器代码。最好的方法是分析方法的哪个部分比较慢(例如使用line_profiler),然后将该部分卸载到另一个方法,该方法使用dataframe列的^{} 属性构造输入,该属性允许您访问底层numpy数组。否则numba只会在“对象模式”下运行(参见numba glossary),不会显著提高性能
将代码矢量化的诀窍是不要按行考虑,而是按列考虑。
我几乎要完成这项工作了(稍后我会尽力完成),但你想做的是:
如果我理解正确,您应该做的是从函数返回一个序列。类似于:
如果要将结果放入相同输入数据帧的新列中,请执行以下操作:
相关问题 更多 >
编程相关推荐