如何使用scikit learn获得优势比和其他相关功能

import pandas as pd from sklearn.linear_model import LogisticRegression url = 'http://www.ats.ucla.edu/stat/mult_pkg/faq/general/sample.csv' df = pd.read_csv(url, na_values=['']) y = df.hon.values X = df.math.values y = y.reshape(200,1) X = X.reshape(200,1) clf = LogisticRegression(C=1e5) clf.fit(X,y) clf.coef_ clf.intercept_

3条回答

网友

1楼 · 编辑于 2024-05-23 23:38:42

你可以通过取系数的指数来得到比值比：

import numpy as np
X = df.female.values.reshape(200,1)
clf.fit(X,y)
np.exp(clf.coef_)

# array([[ 1.80891307]])

至于其他的统计数据，从scikit learn中不容易获得（在scikit learn中，模型评估主要是使用交叉验证完成的），如果需要它们，最好使用不同的库，例如statsmodels。

网友

2楼 · 编辑于 2024-05-23 23:38:42

除了@maxymoo的答案之外，要获得其他统计信息，还可以使用statsmodel。假设您的数据位于名为DataFrame的df中，下面的代码应该显示一个良好的摘要：

import pandas as pd
from patsy import dmatrices
import statsmodels.api as sm 

y, X = dmatrices( 'label ~ age + gender', data=df, return_type='dataframe')
mod = sm.Logit(y, X)
res = mod.fit()
print res.summary()

网友

3楼 · 编辑于 2024-05-23 23:38:42

我不知道使用scikit learn的这种方法，但是statsmodels.a p I.stats中的Table2x2在您的情况下可能很有用，因为它提供了带有3行代码的OR、SE、CI和p值：

import statsmodels.api as sm
table = sm.stats.Table2x2(np.array([[73, 756], [14, 826]]))
table.summary(method='normal')
"""
               Estimate    SE   LCB    UCB p-value
Odds ratio        5.697       3.189 10.178   0.000
Log odds ratio    1.740 0.296 1.160  2.320   0.000
Risk ratio        5.283       3.007  9.284   0.000
Log risk ratio    1.665 0.288 1.101  2.228   0.000
"""

相关问题更多 >

编程相关推荐

热门问题

热门文章