我为我的数据集计算重要性分数,数据集包含30个特征和假设类的I列。而且确实有阴谋。但数组中的值不是按特征重要性得分排序的。numpy或matplotlib中类的开关变量由它管理
from pandas import read_csv, read_excel
from sklearn.ensemble import ExtraTreesClassifier
# load data
names = ['8,oct18', '3,oct18', '4,oct18', '3,sen17', '3,sen17', '4,sen17', '8,sen17', '3,aug17',
'8,aug17','4,aug17', '3,apr17', '4,apr17', '8,apr17', '3,jan17', '8,jan17', '4,jan17', 'jan19', 'jan19',
'jan19', 'may18', 'may18', 'may18', '11, sen17', '11, dec2017', '12,dec2017', '11,aug 2017',
'12,aug 2017', '11, apr 2017', '12, apr 2017', '30t', 'class']
dataframe = read_excel("/home/qw/myprojects/valuevo/data.xlsx", names = names)
array = dataframe.values
X = array[:,0:30]
Y = array[:,30]
# feature extraction
model = ExtraTreesClassifier()
model.fit(X, Y)
print(names, '=', model.feature_importances_)
import matplotlib.pyplot as plt
import numpy as np
#case x,y were 31x30 before this:
names = ['8,oct18', '3,oct18', '4,oct18', '3,sen17', '3,sen17', '4,sen17', '8,sen17', '3,aug17',
'8,aug17','4,aug17', '3,apr17', '4,apr17', '8,apr17', '3,jan17', '8,jan17', '4,jan17', 'jan19', 'jan19',
'jan19', 'may18', 'may18', 'may18', '11, sen17', '11, dec2017', '12,dec2017', '11,aug 2017',
'12,aug 2017', '11, apr 2017', '12, apr 2017', '30t']
x = np.array(names)
y = np.array(model.feature_importances_)
plt.title("RF score")
plt.plot(y,x)
您可以使用
np.argsort
获得值的排序顺序,并使用该索引将x和y值一起排序:我不知道为什么要调用x值
y
和y值x
。这有点令人困惑相关问题 更多 >
编程相关推荐