python中的SVM(支持向量机)总是给出相同的预测

2024-06-16 12:24:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一套4个主成分(见下面的pc1:pc2),我用它们作为输入变量来预测我的y变量(下面的y-var)。我尝试使用SVM预测y-var,使用pc1和pc2,如下所示:

from sklearn.decomposition import PCA
from mlxtend.plotting import plot_decision_regions
from mlxtend.plotting import plot_decision_regions
from sklearn.svm import SVC

df = x_var[['pc1','pc2']].join(y_var["y-var"])

clf = SVC(C=1,gamma=0.0001)
X_train = np.array(df[['pc1', 'pc2']])
y_train = np.array(df["y-var"])
clf.fit(X_train, y_train)
plot_decision_regions((X_train), (y_train), clf=clf, legend=2)

这给了我:

enter image description here

显然,SVM将所有事物分类为“一”(您在图中看不到决策边界)。为什么我不能得到1和0的分类?我还尝试了不同的内核,并进行了gridsearch,结果总是一样的

    pc1       pc2          y-var
0   0.519179  0.247208      1
1   0.271661  0.378146      1
2   0.160372  0.395769      1
3   0.131858  0.377220      0
4  -0.082872  0.099886      1
5  -0.018304  0.125293      1
6  -0.075480  0.129186      1
7  -0.120394  0.103077      1
8  -0.079285  0.315473      0
9  -0.061470  0.373005      1
10 -0.114704  0.318144      0
11 -0.036623  0.402758      0
12 -0.266696  0.102101      1
13 -0.304520 -0.044354      1
14 -0.341065 -0.091845      1
15 -0.335393 -0.158577      1
16 -0.294246 -0.172631      1
17 -0.112002  0.107467      0
18 -0.008648  0.039244      0
19 -0.016432 -0.011859      1
20  0.025505 -0.003516      0
21  0.065414 -0.144414      0
22  0.058254 -0.199284      1
23  0.080844 -0.227434      1
24  0.146013 -0.177407      0
25  0.072719 -0.215493      1
26  0.076515 -0.218327      1
27  0.073930 -0.205280      0
28  0.084932 -0.213145      1
29  0.127504 -0.119456      1
30  0.410069 -0.070637      0
31  0.444208 -0.054756      0
32  0.359892 -0.039921      1
33  0.351449  0.039005      1
34  0.340579 -0.061595      1
35  0.195910 -0.088828      1
36  0.169974  0.014353      1
37  0.168284 -0.034547      0
38  0.163418  0.009783      1
39  0.222996 -0.020889      0
40  0.131592  0.197540      1
41  0.035192  0.160503      1
42 -0.005788  0.010568      1
43 -0.146251 -0.078299      0
44 -0.165629 -0.054383      1
45 -0.157875 -0.065957      0
46 -0.144255 -0.038511      1
47 -0.115826 -0.080849      0
48 -0.145774 -0.064944      1
49 -0.218346 -0.008935      1
50 -0.154941 -0.066568      0
51 -0.173926 -0.109107      0
52 -0.191553 -0.059816      1
53 -0.209128 -0.118813      1

Tags: fromimportdfplotvartrainsklearnplotting
2条回答

SVM有一些超参数(比如使用什么样的C或gamma值),找到最佳超参数是一项不容易解决的任务。要查找est超参数,您可以创建一个超参数网格,然后尝试它们的所有组合(因此,此方法称为Gridsearch

GridSearchCV采用一个字典,描述可以在模型上尝试训练的参数。参数网格定义为一个字典,其中键是参数,值是要测试的设置。 您可以使用代码lin=ke this来查找最佳超参数

用于导入网格搜索

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC 
model = SVC()

定义参数范围

param_grid = {'C': [0.1, 1, 10, 100, 1000],  
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001], 
              'kernel': ['rbf']} 

grid = GridSearchCV(SVC(), param_grid, refit = True, verbose = 3)

网格搜索模型的拟合

grid.fit(X_train, y_train) 

更多信息(参考资料):SVM Hyperparameter Tuning using GridSearchCV | ML

HyperParameter tuning an SVM — a Demonstration using HyperParameter tuning

您的代码运行良好,C、gamma值似乎是个问题。在原始代码中,使用clf = SVC(C=1000, gamma=5)并与其他人一起修改C和gamma应该会产生结果

带有C=1000gamma=5的输出:

enter image description here

代码测试:

from sklearn.decomposition import PCA
from mlxtend.plotting import plot_decision_regions
from mlxtend.plotting import plot_decision_regions
from sklearn.svm import SVC

pc1 = [-0.114704, -0.036623, -0.266696, -0.304520]
pc2 = [0.318144, 0.402758, 0.102101, -0.044354]
yvar = [0, 0, 1, 1]


import numpy as np    
df = np.column_stack((pc1, pc2))



clf = SVC(C=1, gamma=0.0001, kernel='linear')
X_train = np.array(df)
y_train = np.array(yvar)
clf.fit(X_train, y_train)
plot_decision_regions((X_train), (y_train), clf=clf, legend=2)

输出:

enter image description here

乘以一个更大的数字

from sklearn.decomposition import PCA
from mlxtend.plotting import plot_decision_regions
from mlxtend.plotting import plot_decision_regions
from sklearn.svm import SVC



pc1 = [
0.519179,
0.271661,
0.160372,
0.131858,
-0.082872,
-0.018304,
-0.075480,
-0.120394,
-0.079285,
-0.061470,
-0.114704,
-0.036623,
-0.266696,
-0.304520,
-0.341065,
-0.335393,
-0.294246,
-0.112002,
-0.008648,
-0.016432,
0.025505,
0.065414,
0.058254,
0.080844,
0.146013,
0.072719,
0.076515,
0.073930,
0.084932,
0.127504,
0.410069,
0.444208,
0.359892,
0.351449,
0.340579,
0.195910,
0.169974,
0.168284,
0.163418,
0.222996,
0.131592,
0.035192,
-0.005788,
-0.146251,
-0.165629,
-0.157875,
-0.144255,
-0.115826,
-0.145774,
-0.218346,
-0.154941,
-0.173926,
-0.191553,
-0.209128
]

pc2 = [
0.247208,
0.378146,
0.395769,
0.377220,
0.099886,
0.125293,
0.129186,
0.103077,
0.315473,
0.373005,
0.318144,
0.402758,
0.102101,
-0.044354,
-0.091845,
-0.158577,
-0.172631,
0.107467,
0.039244,
-0.011859,
-0.003516,
-0.144414,
-0.199284,
-0.227434,
-0.177407,
-0.215493,
-0.218327,
-0.205280,
-0.213145,
-0.119456,
-0.070637,
-0.054756,
-0.039921,
0.039005,
-0.061595,
-0.088828,
0.014353,
-0.034547,
0.009783,
-0.020889,
0.197540,
0.160503,
0.010568,
-0.078299,
-0.054383,
-0.065957,
-0.038511,
-0.080849,
-0.064944,
-0.008935,
-0.066568,
-0.109107,
-0.059816,
-0.118813
]

yvar = [
1,
1,
1,
0,
1,
1,
1,
1,
0,
1,
0,
0,
1,
1,
1,
1,
1,
0,
0,
1,
0,
0,
1,
1,
0,
1,
1,
0,
1,
1,
0,
0,
1,
1,
1,
1,
1,
0,
1,
0,
1,
1,
1,
0,
1,
0,
1,
0,
1,
1,
0,
0,
1,
1
]

pc1 = [i * 10 for i in pc1]
pc2 = [i * 10 for i in pc2]

import numpy as np
df = np.column_stack((pc1, pc2))

#df = x_var[['pc1','pc2']].join(y_var["y-var"])

from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.gaussian_process import GaussianProcessClassifier
from sklearn.gaussian_process.kernels import RBF
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis


#clf = RandomForestClassifier(max_depth=5, n_estimators=10, max_features=1)
#clf = AdaBoostClassifier()
#clf = QuadraticDiscriminantAnalysis()
#clf = KNeighborsClassifier(3)
#clf = DecisionTreeClassifier(max_depth=20)
#clf = SVC(C=1, gamma=0.25)
clf = SVC(C=100, gamma=0.5)

X_train = np.array(df)
y_train = np.array(yvar)
clf.fit(X_train, y_train)
plot_decision_regions((X_train), (y_train), clf=clf, legend=2)

输出:

enter image description here

相关问题 更多 >