你能在大样本上使用隔离林算法吗？

1条回答

网友

1楼 · 发布于 2024-06-16 08:55:51

引用原文：

The isolation characteristic of iTrees enables them to build partial models and exploit sub-sampling to an extent that is not feasible in existing methods. Since a large part of an iTree that isolates normal points is not needed for anomaly detection; it does not need to be constructed. A small sample size produces better iTrees because the swamping and masking effects are reduced.

从您的问题中，我有一种感觉，您混淆了数据集的大小和从中获取的用于构建iTree的样本的大小。隔离林可以处理非常大的数据集。当它对它们进行采样时，效果更好

原始文件在第3章中对此进行了讨论：

The data set has two anomaly clusters located close to one large cluster of normal points at the centre. There are interfering normal points surrounding the anomaly clusters, and the anomaly clusters are denser than normal points in this sample of 4096 instances. Figure 4(b) shows a sub-sample of 128 instances of the original data. The anomalies clusters are clearly identifiable in the sub-sample. Those normal instances surrounding the two anomaly clusters have been cleared out, and the size of anomaly clusters becomes smaller which makes them easier to identify. When using the entire sample, iForest reports an AUC of 0.67. When using a sub-sampling size of 128, iForest achieves an AUC of 0.91.

隔离林不是一个完美的算法，需要针对特定数据调整参数。它甚至可能在某些数据集上表现不佳。如果您想考虑其他方法，Local Outlier Factor也包含在^ {CD1> }中。您还可以组合多种方法（集成）

在这里，您可以找到一个很好的comparison不同的方法

相关问题更多 >

编程相关推荐

热门问题

热门文章

你能在大样本上使用隔离林算法吗？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >