Python为OLS回归整理DataFrame中的列

2024-05-23 23:04:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含以下列的csv文件:

日期| Mkt RF | SMB | HML | RF | C | aig RF |福特RF | ibm RF | xom RF |

我试图在python中运行一个多元OLS回归,例如在aig-RF上回归“Mkt-RF”、“SMB”和“HML”。在

我似乎需要先从数组中整理数据帧,但我似乎不明白如何:

回归

x = df[['Mkt-RF','SMB','HML']]
y = df['aig-RF']
df = pd.DataFrame({'x':x, 'y':y})
df['constant'] = 1
df.head()
sm.OLS(y,df[['constant','x']]).fit().summary()

完整代码是:

将numpy作为np导入 将熊猫作为pd导入 从pandas导入数据帧 从sklearn导入线性模型 进口statsmodels.api作为sm

定义读取(sIn): """ 目的: 读取FF数据

^{pr2}$

def合资公司(df、sStock、sPer): """ 目的: 将股票加入数据帧,作为超额收益

Inputs:
    df      dataframe, data including RF
    sStock  string, name of stock to read
    sPer    string, extension indicating period

Return value:
    df      dataframe, enlarged
"""
df1= pd.read_csv(sStock+"_"+sPer+".csv", index_col="Date", usecols=["Date", "Adj Close"])
df1.columns= [sStock]

# Add prices to original dataframe, to get correct dates
df= df.join(df1, how="left")

# Extract returns
vR= 100*np.diff(np.log(df[sStock].values))
# Add a missing, as one observation was lost differencing
vR= np.hstack([np.nan, vR])

# Add excess return to dataframe
df[sStock + "-RF"]= vR - df["RF"]
print(df)

return df

def SaveFF(df、asStock、sOut): """ 目的: 保存FF回归数据

Inputs:
    df      dataframe, all data
    asStock list of strings, stocks
    sOut    string, output file name

Output:
    file written to disk
"""
df= df.dropna(how='any')

asOut= ['Mkt-RF', 'SMB', 'HML', 'RF', 'C']
for sStock in asStock:
    asOut.append(sStock+"-RF")

print ("Writing columns ", asOut, "to file ", sOut)


df.to_csv(sOut, columns=asOut, index_label="Date", float_format="%.8g")

print(df)
return df

def main():

sPer= "0018"
sIn= "Research_Data_Factors_weekly.csv"
sOut= "ffstocks"
asStock= ["aig", "ford", "ibm", "xom"]

# Initialisation
df= ReadFF(sIn)
for sStock in asStock:
    df= JoinStock(df, sStock, sPer)

# Output
SaveFF(df, asStock, sOut+"_"+sPer+".csv")
print ("Done")

# Regression
x = df[['Mkt-RF','SMB','HML']]
y = df['aig-RF']
df = pd.DataFrame({'x':x, 'y':y})
df['constant'] = 1
df.head()
sm.OLS(y,df[['constant','x']]).fit().summary()

我到底需要修改什么pd数据帧为了得到多元OLS回归表?在


Tags: csvto数据dfnppdrfaig
1条回答
网友
1楼 · 发布于 2024-05-23 23:04:23

我建议将代码的第一部分更改为以下(主要是交换行顺序):

# add constant column to the original dataframe
df['constant'] = 1

# define x as a subset of original dataframe
x = df[['Mkt-RF', 'SMB', 'HML', 'constant']]

# define y as a series
y = df['aig-RF']

# pass x as a dataframe, while pass y as a series
sm.OLS(y, x).fit().summary()

希望这有帮助。在

相关问题 更多 >