sklearn（scikit-learn）逻辑回归包 -- 设置分类的训练系数。

4 投票

2 回答

14897 浏览

数据工程师

提问于 2025-04-17 08:31

我看了scikit-learn这个包的网页：

http://scikit-learn.sourceforge.net/dev/modules/generated/sklearn.linear_model.LogisticRegression.html

我可以用逻辑回归来拟合数据，得到一个LogisticRegression的实例后，就可以用它来分类新的数据点。到这里为止都没问题。

但是，有没有办法设置LogisticRegression()实例的系数呢？因为在我得到训练好的系数后，我想用同样的接口来分类新的数据点。

或者，也许有人推荐其他的Python机器学习包，它们的接口更好用？

谢谢！

机器学习数据分类 scikit-learn 数据拟合分类模型逻辑回归训练系数模型接口

2 个回答

这些系数是你在创建逻辑回归类的时候生成的估计器对象的属性，所以你可以用普通的Python方式来访问它们：

>>> import numpy as NP
>>> from sklearn import datasets
>>> from sklearn import datasets as DS
>>> digits = DS.load_digits()
>>> D = digits.data
>>> T = digits.target

>>> # instantiate an estimator instance (classifier) of the Logistic Reg class
>>> clf = LR()
>>> # train the classifier
>>> clf.fit( D[:-1], T[:-1] )
    LogisticRegression(C=1.0, dual=False, fit_intercept=True, 
      intercept_scaling=1, penalty='l2', tol=0.0001)

>>> # attributes are accessed in the normal python way
>>> dx = clf.__dict__
>>> dx.keys()
    ['loss', 'C', 'dual', 'fit_intercept', 'class_weight_label', 'label_', 
     'penalty', 'multi_class', 'raw_coef_', 'tol', 'class_weight', 
     'intercept_scaling']

这就是获取系数的方法，但如果你只是想用这些系数来进行预测，更直接的方式是使用估计器的predict方法：

>>> # instantiate the L/R classifier, passing in norm used for penalty term 
>>> # and regularization strength
>>> clf = LR(C=.2, penalty='l1')
>>> clf
    LogisticRegression(C=0.2, dual=False, fit_intercept=True, 
      intercept_scaling=1, penalty='l1', tol=0.0001)

>>> # select some "training" instances from the original data
>>> # [of course the model should not have been trained on these instances]
>>> test = NP.random.randint(0, 151, 5)
>>> d = D[test,:]     # random selected data points w/o class labels
>>> t = T[test,:]     # the class labels that correspond to the points in d

>>> # generate model predictions for these 5 data points
>>> v = clf.predict(d)
>>> v
    array([0, 0, 2, 0, 2], dtype=int32)
>>> # how well did the model do?
>>> percent_correct = 100*NP.sum(t==v)/t.shape[0]
>>> percent_correct
    100

回答于 2025-04-17 由 Python大师

分享举报

其实，estimator.coef_ 和 estimator.intercept_ 这两个属性是只读的，不能随便修改，而不是普通的属性。它们的值来自于 estimator.raw_coef_ 这个数组，这个数组的内存布局和底层的 liblinear C++ 实现的逻辑回归的内存布局是直接对应的，这样在调用 estimator.predict 或 estimator.predict_proba 时，就不需要对参数进行内存复制。

我同意只读属性确实是个限制，我们应该想办法解决这个问题。不过如果我们要重构这个实现，也要注意不要引入不必要的内存复制，这一点在快速浏览源代码后并不简单。

我在跟踪器上开了一个问题，以免忘记这个限制。

与此同时，你可以查看带有 @property 注解的 estimator.coef_ 方法，了解 estimator.coef_ 和 estimator.raw_coef_ 之间的关系，并直接修改 estimator.raw_coef_ 的值。

回答于 2025-04-17 由 Python大师

分享举报

sklearn（scikit-learn）逻辑回归包 -- 设置分类的训练系数。

2 个回答

撰写回答