一组用于异常检测的python模块
kenchi的Python项目详细描述
肯尼亚
这是一个scikit学习兼容库,用于异常检测。
依赖性
- 必需的依赖项
- numpy>=1.13.3(bsd 3子句许可证)
- scikit-learn>=0.20.0(bsd 3子句许可证)
- scipy>=0.19.1(bsd 3子句许可证)
- 可选依赖项
- matplotlib>=2.1.2(基于psf的许可证)
- networkx>=2.2(bsd 3子句许可证)
安装
您可以通过pip
安装pip install kenchi
或conda。
conda install -c y_ohr_n kenchi
算法
示例
importmatplotlib.pyplotaspltimportnumpyasnpfromkenchi.datasetsimportload_pimafromkenchi.outlier_detectionimport*fromkenchi.pipelineimportmake_pipelinefromsklearn.model_selectionimporttrain_test_splitfromsklearn.preprocessingimportStandardScalernp.random.seed(0)scaler=StandardScaler()detectors=[FastABOD(novelty=True,n_jobs=-1),OCSVM(),MiniBatchKMeans(),LOF(novelty=True,n_jobs=-1),KNN(novelty=True,n_jobs=-1),IForest(n_jobs=-1),PCA(),KDE()]# Load the Pima Indians diabetes dataset.X,y=load_pima(return_X_y=True)X_train,X_test,_,y_test=train_test_split(X,y)# Get the current Axes instanceax=plt.gca()fordetindetectors:# Fit the model according to the given training datapipeline=make_pipeline(scaler,det).fit(X_train)# Plot the Receiver Operating Characteristic (ROC) curvepipeline.plot_roc_curve(X_test,y_test,ax=ax)# Display the figureplt.show()
参考文献
[1] | Angiulli, F., and Pizzuti, C., “Fast outlier detection in high dimensional spaces,” In Proceedings of PKDD, pp. 15-27, 2002. |
[2] | Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J., “LOF: identifying density-based local outliers,” In Proceedings of SIGMOD, pp. 93-104, 2000. |
[3] | Dua, D., and Karra Taniskidou, E., “UCI Machine Learning Repository,” 2017. |
[4] | Goix, N., “How to evaluate the quality of unsupervised anomaly detection algorithms?” In ICML Anomaly Detection Workshop, 2016. |
[5] | Goldstein, M., and Dengel, A., “Histogram-based outlier score (HBOS): A fast unsupervised anomaly detection algorithm,” KI: Poster and Demo Track, pp. 59-63, 2012. |
[6] | Ide, T., Lozano, C., Abe, N., and Liu, Y., “Proximity-based anomaly detection using sparse structure learning,” In Proceedings of SDM, pp. 97-108, 2009. |
[7] | Kriegel, H.-P., Kroger, P., Schubert, E., and Zimek, A., “Interpreting and unifying outlier scores,” In Proceedings of SDM, pp. 13-24, 2011. |
[8] | Kriegel, H.-P., Schubert, M., and Zimek, A., “Angle-based outlier detection in high-dimensional data,” In Proceedings of SIGKDD, pp. 444-452, 2008. |
[9] | Lee, W. S, and Liu, B., “Learning with positive and unlabeled examples using weighted Logistic Regression,” In Proceedings of ICML, pp. 448-455, 2003. |
[10] | Liu, F. T., Ting, K. M., and Zhou, Z.-H., “Isolation forest,” In Proceedings of ICDM, pp. 413-422, 2008. |
[11] | Parzen, E., “On estimation of a probability density function and mode,” Ann. Math. Statist., 33(3), pp. 1065-1076, 1962. |
[12] | Ramaswamy, S., Rastogi, R., and Shim, K., “Efficient algorithms for mining outliers from large data sets,” In Proceedings of SIGMOD, pp. 427-438, 2000. |
[13] | Scholkopf, B., Platt, J. C., Shawe-Taylor, J. C., Smola, A. J., and Williamson, R. C., “Estimating the Support of a High-Dimensional Distribution,” Neural Computation, 13(7), pp. 1443-1471, 2001. |
[14] | Sugiyama, M., and Borgwardt, K., “Rapid distance-based outlier detection via sampling,” Advances in NIPS, pp. 467-475, 2013. |