Python group-lasso包_程序模块 - PyPI

sklearn风格api中的快速群套索正则化线性模型。

group-lasso的Python项目详细描述

https://coveralls.io/repos/github/yngvem/group-lasso/badge.svg

https://travis-ci.org/yngvem/group-lasso.svg?branch=master

https://img.shields.io/badge/code%20style-black-000000.svg

https://img.shields.io/pypi/l/group-lasso.svg

https://readthedocs.org/projects/group-lasso/badge/?version=latest

群套索[1]正则化器是一种众所周知的实现结构化机器学习和统计的稀疏性。想法是创造非重叠协变量组，并恢复其中的回归权重只有稀疏的一组共变群具有非零分量。

这可能是个好主意，原因有很多。比如说我们有一组传感器，每个传感器产生5个测量。我们不想维持不必要的传感器数量。如果我们尝试正常套索回归，那么我们将得到稀疏成分。但是，这些稀疏组件可能与传感器，因为它们各自产生五个测量值。如果我们改为使用组套索与测量分组，根据他们被测量的传感器，然后我们会得到一组稀疏的传感器。

群套索正则化的一个推广是稀疏群套索正则项[2]，它同时具有群稀疏性和系数稀疏性稀疏。这是通过组合套索惩罚和传统的套索惩罚。在这个库中，我实现了稀疏组套索解算器完全符合scikit学习api。

关于本项目

该项目由YNGVE Mardal Moe开发，由麻省理工学院发布李森斯。我还在做一些事情，这样变化可能会很快到来。

安装指南

群套索需要python 3.5+，numpy和scikit学习。要通过pip安装群lasso，只需运行命令：

pip install group-lasso

或者，您可以手动拉取此存储库并运行 setup.py文件：

git clone https://github.com/yngvem/group-lasso.git
cd group-lasso
python setup.py

文档

您可以阅读 readthedocs。

示例

组套索回归

组套索正则化器是按照scikit学习api实现的，使那些熟悉python-ml生态系统的人可以很容易地使用它。

importnumpyasnpfromgroup_lassoimportGroupLasso# Dataset parametersnum_data_points=10_000num_features=500num_groups=25assertnum_features%num_groups==0# Generate data matrixX=np.random.standard_normal((num_data_points,num_features))# Generate coefficients and interceptw=np.random.standard_normal((500,1))intercept=2# Generate groups and randomly set coefficients to zerogroups=np.array([[group]*20forgroupinrange(25)]).ravel()forgroupinrange(num_groups):w[groups==group]*=np.random.random()<0.8# Generate target vector:y=X@w+interceptnoise=np.random.standard_normal(y.shape)noise/=np.linalg.norm(noise)noise*=0.3*np.linalg.norm(y)y+=noise# Generate group lasso object and fit the modelgl=GroupLasso(groups=groups,reg=.05)gl.fit(X,y)estimated_w=gl.coef_estimated_intercept=gl.intercept_[0]# Evaluate the modelcoef_correlation=np.corrcoef(w.ravel(),estimated_w.ravel())[0,1]print("True intercept: {intercept:.2f}. Estimated intercept: {estimated_intercept:.2f}".format(intercept=intercept,estimated_intercept=estimated_intercept))print("Correlation between true and estimated coefficients: {coef_correlation:.2f}".format(coef_correlation=coef_correlation))

True intercept: 2.00. Estimated intercept: 1.53
Correlation between true and estimated coefficients: 0.98

将套索组合成变压器

组套索回归也可用作变压器

importnumpyasnpfromsklearn.pipelineimportPipelinefromsklearn.linear_modelimportRidgefromgroup_lassoimportGroupLasso# Dataset parametersnum_data_points=10_000num_features=500num_groups=25assertnum_features%num_groups==0# Generate data matrixX=np.random.standard_normal((num_data_points,num_features))# Generate coefficients and interceptw=np.random.standard_normal((500,1))intercept=2# Generate groups and randomly set coefficients to zerogroups=np.array([[group]*20forgroupinrange(25)]).ravel()forgroupinrange(num_groups):w[groups==group]*=np.random.random()<0.8# Generate target vector:y=X@w+interceptnoise=np.random.standard_normal(y.shape)noise/=np.linalg.norm(noise)noise*=0.3*np.linalg.norm(y)y+=noise# Generate group lasso object and fit the model# We use an artificially high regularisation coefficient since#  we want to use group lasso as a variable selection algorithm.gl=GroupLasso(groups=groups,group_reg=0.1,l1_reg=0.05)gl.fit(X,y)new_X=gl.transform(X)# Evaluate the modelpredicted_y=gl.predict(X)R_squared=1-np.sum((y-predicted_y)**2)/np.sum(y**2)print("The rows with zero-valued coefficients have now been removed from the dataset.")print("The new shape is:",new_X.shape)print("The R^2 statistic for the group lasso model is: {R_squared:.2f}".format(R_squared=R_squared))print("This is very low since the regularisation is so high."# Use group lasso in a scikit-learn pipelinepipe=Pipeline(memory=None,steps=[('variable_selection',GroupLasso(groups=groups,reg=.1)),('regressor',Ridge(alpha=0.1))])pipe.fit(X,y)predicted_y=pipe.predict(X)R_squared=1-np.sum((y-predicted_y)**2)/np.sum(y**2)print("The R^2 statistic for the pipeline is: {R_squared:.2f}".format(R_squared=R_squared))

The rows with zero-valued coefficients have now been removed from the dataset.
The new shape is: (10000, 280)
The R^2 statistic for the group lasso model is: 0.17
This is very low since the regularisation is so high.
The R^2 statistic for the pipeline is: 0.72

折叠工作

待办事项按重要性降序排列

python 3.5兼容性

实施细节

使用基于梯度的fista优化程序[4]来解决这个问题。自适应重启方案[5]。当前未实现行搜索，但我希望以后再看。

尽管速度很快，但FISTA优化程序并没有达到二阶内点法速度明显减慢。这可能，在乍一看，似乎是个问题。然而，它确实恢复了稀疏性数据模式，可用于训练具有给定特征的子集。

另外，即使fista乐观者不是随机的乐观主义，我的经验是没有遭受过大的挫折当小批量足够大时的性能。因此我有使用fista实现了小批量优化，因此能够基于大约500列和10000行数据的模型价格昂贵的笔记本电脑。

最后，我们注意到由于fista使用nesterov加速度，因此下降算法。因此，我们不能指望损失会减少单调地。

参考文献

[1]	Yuan, M. and Lin, Y. (2006), Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 68: 49-67. doi:10.1111/j.1467-9868.2005.00532.x

[2]	Simon, N., Friedman, J., Hastie, T., & Tibshirani, R. (2013). A sparse-group lasso. Journal of Computational and Graphical Statistics, 22(2), 231-245.

[3]	Yuan L, Liu J, Ye J. (2011), Efficient methods for overlapping group lasso. Advances in Neural Information Processing Systems (pp. 352-360).

[4]	Beck, A. and Teboulle, M. (2009), A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM Journal on Imaging Sciences 2009 2:1, 183-202. doi:10.1137/080716542

[5]	O’Donoghue, B. & Candès, E. (2015), Adaptive Restart for Accelerated Gradient Schemes. Found Comput Math 15: 715. doi:10.1007/s10208-013-9150-

欢迎加入QQ群-->： 979659372

group-lasso 1.0.0

group-lasso的Python项目详细描述

关于本项目

安装指南

文档

示例

组套索回归

将套索组合成变压器

折叠工作

实施细节

参考文献

推荐PyPI第三方库

mr-distributions-test-mr

Ig-Tools-3

distributions-EH

sphinxjs

SeleniumCookie

distributions-aula4-udacit

CoVeriTeam

python-seabird

mercurial-testhelpers

vcap

gtfsrealtimebindings

rqams_client

veracitysdk

hubplo

tangledupinunicode

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

group-lasso 1.0.0

group-lasso的Python项目详细描述

关于本项目

安装指南

文档

示例

组套索回归

将套索组合成变压器

折叠工作

实施细节

参考文献

推荐PyPI第三方库

mr-distributions-test-mr

Ig-Tools-3

distributions-EH

sphinxjs

SeleniumCookie

distributions-aula4-udacit

CoVeriTeam

python-seabird

mercurial-testhelpers

vcap

gtfsrealtimebindings

rqams_client

veracitysdk

hubplo

tangledupinunicode

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签