使用一个train_test_spli命令创建多个数据集

2024-04-19 14:14:36 发布

您现在位置：Python中文网/ 问答频道 /正文

4292

网友

男 | 程序猿一只，喜欢编程写python代码。

我的数据集有42000行
我需要将数据集划分为training, cross-validation and test个集，其中拆分为60%, 20% and 20%。这是Andrew Ng教授在他的ML班演讲中的建议。在
我意识到scikitlearn有一个方法train_test_split来实现这一点。但是我不能让它工作，以便在一个行命令中得到0.6, 0.2, 0.2的拆分

我要做的是

# split data into training, cv and test sets
from sklearn import cross_validation
train, intermediate_set = cross_validation.train_test_split(input_set, train_size=0.6, test_size=0.4)
cv, test = cross_validation.train_test_split(intermediate_set, train_size=0.5, test_size=0.5)


# preparing the training dataset
print 'training shape(Tuple of array dimensions) = ', train.shape
print 'training dimension(Number of array dimensions) = ', train.ndim
print 'cv shape(Tuple of array dimensions) = ', cv.shape
print 'cv dimension(Number of array dimensions) = ', cv.ndim
print 'test shape(Tuple of array dimensions) = ', test.shape
print 'test dimension(Number of array dimensions) = ', test.ndim

给我的结果是

^{pr2}$

我怎样才能在一个命令中完成这个任务？在

Tags： and of test size training train array cv

1条回答

网友

1楼 · 发布于 2024-04-19 14:14:36

阅读train_test_split及其配套类ShuffleSplit的源代码，并根据您的用例进行调整。这不是一个很大的函数，应该不是很复杂。在

使用一个train_test_spli命令创建多个数据集

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用一个train_test_spli命令创建多个数据集

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >