迭代OLS模型使用Python Pandas和statsmodels运行得非常慢？（可能是数据帧使用不当！）

2024-05-16 11:47:09 发布

您现在位置：Python中文网/ 问答频道 /正文

2644

网友

男 | 程序猿一只，喜欢编程写python代码。

我使用Stats模型和Pandas来自动执行各种变量组合的线性回归的迭代过程。变量组合总数达到697343。这是一个非常多的OLS计算，但我不认为它会花很长时间（超过1小时）。X最大为18x18，Y始终为18X1。在

有人能告诉我，如果我正在使用的代码没有优化？有没有可能提出解决方案？在

import time
import pandas
import statsmodels.api as sm
perm = pandas.read_pickle('C:\SharedData\Temp\ResultTestDataframes\perm')
BB=pandas.read_pickle('C:\SharedData\Temp\ResultTestDataframes\BB')
wdb_demog=pandas.read_pickle("C:/SharedData/Temp/ResultTestDataframes/wdb_demog")
wdb_hts=pandas.read_pickle("C:/SharedData/Temp/ResultTestDataframes/wdb_hts")

result_db= pandas.DataFrame(columns=('R-squared value','Adj. R-squared','F-statistic','Prob (F-statistic)','coefficeints','Variables'))
row=-1
for v in range(len(perm)):
    row+=1
    variables_columns=list(set(perm.loc[v]))
    if None in variables_columns:
        variables_columns.remove(None)   
    X= pandas.DataFrame(BB[variables_columns]).values.tolist()
    Y= pandas.DataFrame(BB[wdb_hts.columns.values[1]]).values.tolist()    
    model = sm.OLS(Y,X)
    results = model.fit()
    R=[round(results.rsquared,4),
       round(results.rsquared_adj,4),
       round(results.fvalue,4),
       round(results.f_pvalue,4),
       list(results.params),
       list(variables_columns)] 
    result_db.loc[row]= pandas.Series(R, index=result_db.columns)

result_db.to_pickle("C:/SharedData/Temp/ResultTestDataframes/TEST")
print "done! " + time.strftime("%c")

--------------------

# BB is the DataFrame (18 rows × 90 columns ) 
# perm is the DataFrame (697343 × 17) that has all the combinations of variables' . The X  (exogenous variables) will be built using the given combination of variables and the  data in BB data frame   
# wdb_hts is another data frame to read the variables name to construct the Y (endogenous variables)

Tags： columns the dataframe pandas read variables temp results

0条回答

目前没有回答

迭代OLS模型使用Python Pandas和statsmodels运行得非常慢？（可能是数据帧使用不当！）

相关问题更多 >

编程相关推荐

热门问题

热门文章

迭代OLS模型使用Python Pandas和statsmodels运行得非常慢？（可能是数据帧使用不当！）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >