Statsmodels公司Logit.fit_正则化一直在跑

for train_index, test_index in sss.split(df_modelo.Cuerpo, df_modelo.Dummy_genero): X_train, X_test = df_modelo.Cuerpo[train_index], df_modelo.Cuerpo[test_index] y_train, y_test = df_modelo.Dummy_genero[train_index], df_modelo.Dummy_genero[test_index] cvectorizer=CountVectorizer(max_df=0.97, min_df=3, ngram_range=(1,1) ) vec=cvectorizer.fit(X_train) X_train_vectorized = vec.transform(X_train)

1条回答

网友

1楼 · 发布于 2024-06-09 06:10:31

几乎所有的stats模型和所有的推理都是针对观测数远远大于特征数的情况而设计的。在

Logit.fit_regularized使用带有scipy优化器的内部点算法，该算法需要将所有特性保存在内存中。参数的推断需要具有形状n_特征的参数估计的协方差。它设计的用例是当特征的数量相对于观察的数量相对较少时，并且Hessian可以用于内存中。在

GLM.fit_regularized估计弹性网络惩罚参数并使用坐标下降。这可能处理大量的特征，但没有任何可用的推断结果。在

在套索和类似的惩罚措施之后，关于选择变量的推论只有在最近的研究中才可用。请参阅Pythonhttps://github.com/selective-inference/Python-software中的选择推理示例，其中还提供了R包。在

相关问题更多 >

编程相关推荐

热门问题

热门文章