sklearn.model_选择.GroupShuffleSplit并没有像它预期的那样产生分割

2024-04-19 01:18:21 发布

男 | 程序猿一只，喜欢编程写python代码。

所以，我需要用预定义的组生成测试/训练/验证拆分。我不想使用LeavePGroupsOut，因为我需要根据我想要的优势将数据分离到培训和验证集中。在GroupShuffleSplit的文档中，对于test_size参数，据说：

test_size : float, int, None, optional If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples. If None, the value is set to the complement of the train size. By default, the value is set to 0.2. The default will change in version 0.21. It will remain 0.2 only if train_size is unspecified, otherwise it will complement the specified train_size.

然而，事实并非如以下准则所述：

tr, ts = next(GroupShuffleSplit(n_splits=1, test_size=3).split(TR_set, groups=tr_groups))
print(tr)
print(ts)

例如打印出来：

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 91 92 93 99 101 102 103 104 105 106 107] [ 26 27 89 90 94 95 96 97 98 100]

正如您在上面看到的，测试大小不是3而是大于3。几乎总是这样。我检查了一组索引。显然，如果test_size是一个整数，它代表测试组的绝对数量，而不是样本。我认为文件有误导性。在

另外，当test_size是一个浮点数时，它通常不考虑指定的比率。这可能是由于组中的样本大小不相等，但必须有一个注释/警告，说明在不相等的组大小和测试大小比率的情况下，它会遵循什么样的行为。在

^{pr2}$

它给出了：

70
38

其中试验尺寸为全套的35%（应为10%）。在

所以，要么我遗漏了什么，要么文档只是错误的描述。在

谢谢。在

Tags： of the to 文档 test size if is

1条回答

网友

1楼 · 发布于 2024-04-19 01:18:21

没有错误，但文档在某些方面不正确。我在scikit-learn's github page上为这个主题打开了一个问题。在

sklearn.model_选择.GroupShuffleSplit并没有像它预期的那样产生分割

相关问题更多 >

编程相关推荐

热门问题

热门文章

sklearn.model_选择.GroupShuffleSplit并没有像它预期的那样产生分割

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >