pythonsklearn logistic回归Khold crossvalidation：如何为coef创建drameframe_

print(file.head()) Result Interest Limit Service Convenience Trust Speed 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 2 0 1 1 1 1 1 1 3 0 4 4 3 4 2 3 4 1 4 4 4 4 4 4

0.823061630219 0 1 0 Interest 0.163577 1 Limit -0.161104 2 Service 0.323073 3 Convenience 0.121573 4 Trust 0.370012 5 Speed 0.089934 6 Major 0.183002 7 Ads 0.0137151

1条回答

网友

1楼 · 发布于 2024-04-26 07:46:26

cross_val_score是一个helper函数，它包装scikit learn的各种对象以进行交叉验证（例如KFold，StratifiedKFold）。它根据使用的scoring参数返回一个分数列表（对于分类问题，我相信默认情况下是accuracy）。在

cross_val_score的return对象不允许您访问交叉验证中使用的底层折叠/模型，这意味着您无法获得每个模型的系数。在

要获得交叉验证的每一次的系数，您需要使用KFold（或者如果您的类是不平衡的，StratifiedKFold）。在

import pandas as pd
from sklearn.model_selection import StratifiedKFold
from sklearn.linear_model import LogisticRegression

df = pd.read_clipboard()
file = pd.concat([df, df, df]).reset_index()

X = file.drop(['Result'],1)
y = file['Result']

skf = StratifiedKFold(n_splits=2, random_state=0)

models, coefs = [], []  # in case you want to inspect the models later, too
for train, test in skf.split(X, y):
    print(train, test)
    clf = LogisticRegression(penalty='l1')
    clf.fit(X.loc[train], y.loc[train])
    models.append(clf)
    coefs.append(clf.coef_[0])

pd.DataFrame(coefs, columns=X.columns).mean()

让我们：

^{pr2}$

我必须从您的示例（只有一个正类的实例）中创建数据。我怀疑这些数字对你来说不会是0。在

编辑由于StratifiedKFold（或KFold）为我们提供了数据集的交叉验证拆分，您仍然可以使用模型的score方法计算交叉验证分数。在

下面的版本与上面的版本稍有不同，目的是获取每个折叠的交叉验证分数。在

models, scores, coefs = [], [], []  # in case you want to inspect the models later, too
for train, test in skf.split(X, y):
    print(train, test)
    clf = LogisticRegression(penalty='l1')
    clf.fit(X.loc[train], y.loc[train])
    score = clf.score(X.loc[test], y.loc[test])
    models.append(clf)
    scores.append(score)
    coefs.append(clf.coef_[0])

相关问题更多 >

编程相关推荐

热门问题

热门文章