42000
行training, cross-validation and test
个集,其中拆分为60%, 20% and 20%
。这是Andrew Ng教授在他的ML班演讲中的建议。在0.6, 0.2, 0.2
的拆分我要做的是
# split data into training, cv and test sets
from sklearn import cross_validation
train, intermediate_set = cross_validation.train_test_split(input_set, train_size=0.6, test_size=0.4)
cv, test = cross_validation.train_test_split(intermediate_set, train_size=0.5, test_size=0.5)
# preparing the training dataset
print 'training shape(Tuple of array dimensions) = ', train.shape
print 'training dimension(Number of array dimensions) = ', train.ndim
print 'cv shape(Tuple of array dimensions) = ', cv.shape
print 'cv dimension(Number of array dimensions) = ', cv.ndim
print 'test shape(Tuple of array dimensions) = ', test.shape
print 'test dimension(Number of array dimensions) = ', test.ndim
给我的结果是
^{pr2}$我怎样才能在一个命令中完成这个任务?在
阅读train_test_split及其配套类ShuffleSplit的源代码,并根据您的用例进行调整。这不是一个很大的函数,应该不是很复杂。在
相关问题 更多 >
编程相关推荐