Python imbalanced-learn包_程序模块 - PyPI

机器学习中的不平衡数据集工具箱。

imbalanced-learn的Python项目详细描述

不平衡学习

不平衡学习是一个python包，提供了许多重采样技术通常用于显示类间严重不平衡的数据集。它与scikit-learn兼容，是scikit-learn-contrib的一部分。项目。

文件

安装文档、API文档和示例可以在 documentation。

安装

依赖关系

不平衡学习在Python3.6+下测试。依赖性要求基于最新的scikit学习版本：

scipy（>；=0.17）
纽比（>；=1.11）
科学套件学习（>；=0.21）
作业库（>；=0.11）
路缘石2（可选）
TensorFlow（可选）

此外，要运行示例，需要matplotlib（>；=2.0.0）和熊猫（>；=0.22）。

安装

不平衡学习目前在pypi的存储库中可用，您可以通过pip安装

pip install -U imbalanced-learn

该软件包也在anaconda云平台中发布：

conda install -c conda-forge imbalanced-learn

如果愿意，可以克隆它并运行setup.py文件。使用以下命令从github获取副本并安装所有依赖项的命令：

git clone https://github.com/scikit-learn-contrib/imbalanced-learn.git
cd imbalanced-learn
pip install .

或者使用pip和github安装：

pip install -U git+https://github.com/scikit-learn-contrib/imbalanced-learn.git

测试

安装后，您可以使用pytest运行测试套件：

make coverage

发展

这套科学仪器的研制与在scikit学习社区。因此，您可以参考 Development Guide。

关于

如果你在科学刊物上使用不平衡学习法，我们将不胜感激。以下论文的引文：

@article{JMLR:v18:16-365,
author  = {Guillaume  Lema{{\^i}}tre and Fernando Nogueira and Christos K. Aridas},
title   = {Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning},
journal = {Journal of Machine Learning Research},
year    = {2017},
volume  = {18},
number  = {17},
pages   = {1-5},
url     = {http://jmlr.org/papers/v18/16-365}
}

大多数分类算法只有在每个类的样本大致相同。高度倾斜的数据集，其中少数民族的人数远远超过一个或多个阶级，事实证明挑战的同时也变得越来越普遍。

解决此问题的一种方法是重新采样数据集以抵消不平衡，希望达到一个更稳健和公平的决策边界否则的话。

重采样技术分为两类：

大多数类别的样本不足。
对少数民族的抽样过多。
结合过采样和欠采样。
创建集成平衡集。

下面是此模块中当前实现的方法的列表。

采样不足
1. 替换抽样下的随机多数
2. 提取大多数少数民族的链接[1]
3. 使用簇质心进行欠采样
4. 未遂事故-（1&2&3）[2]
5. 凝聚近邻[3]
6. 单面选择[4]
7. 邻里清洁规则[5]
8. 编辑近邻[6]
9. 实例硬度阈值[7]
10. 重复编辑近邻[14]
11. 全部[14]
过采样
1. 带替换的随机少数抽样
2. Smote-合成少数民族过采样技术[8]
3. bsmote（1&2）-类型1和2的边界smote [9]
4. SVM smote-支持向量smote[10]
5. 非平衡学习的自适应综合采样方法[15]
过采样，然后是欠采样
1. 打击+打击链接[12]
2. 击打+enn[11]
内部使用采样器的集成分类器
1. 简易程序[13]
2. 平衡ascade[13]
3. 平衡随机林[16]
4. 平衡装袋

不同的算法在sphinx-gallery中给出。

参考文献：

[1]	: I. Tomek, “Two modifications of CNN,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, pp. 769-772, 1976.

[2]	: I. Mani, J. Zhang. “kNN approach to unbalanced data distributions: A case study involving information extraction,” In Proceedings of the Workshop on Learning from Imbalanced Data Sets, pp. 1-7, 2003.

[3]	: P. E. Hart, “The condensed nearest neighbor rule,” IEEE Transactions on Information Theory, vol. 14(3), pp. 515-516, 1968.

[4]	: M. Kubat, S. Matwin, “Addressing the curse of imbalanced training sets: One-sided selection,” In Proceedings of the 14th International Conference on Machine Learning, vol. 97, pp. 179-186, 1997.

[5]	: J. Laurikkala, “Improving identification of difficult small classes by balancing class distribution,” Proceedings of the 8th Conference on Artificial Intelligence in Medicine in Europe, pp. 63-66, 2001.

[6]	: D. Wilson, “Asymptotic Properties of Nearest Neighbor Rules Using Edited Data,” IEEE Transactions on Systems, Man, and Cybernetrics, vol. 2(3), pp. 408-421, 1972.

[7]	: M. R. Smith, T. Martinez, C. Giraud-Carrier, “An instance level analysis of data complexity,” Machine learning, vol. 95(2), pp. 225-256, 2014.

[8]	: N. V. Chawla, K. W. Bowyer, L. O. Hall, W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” Journal of Artificial Intelligence Research, vol. 16, pp. 321-357, 2002.

[9]	: H. Han, W.-Y. Wang, B.-H. Mao, “Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning,” In Proceedings of the 1st International Conference on Intelligent Computing, pp. 878-887, 2005.

[10]	: H. M. Nguyen, E. W. Cooper, K. Kamei, “Borderline over-sampling for imbalanced data classification,” In Proceedings of the 5th International Workshop on computational Intelligence and Applications, pp. 24-29, 2009.

[11]	: G. E. A. P. A. Batista, R. C. Prati, M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM Sigkdd Explorations Newsletter, vol. 6(1), pp. 20-29, 2004.

[12]	: G. E. A. P. A. Batista, A. L. C. Bazzan, M. C. Monard, “Balancing training data for automated annotation of keywords: A case study,” In Proceedings of the 2nd Brazilian Workshop on Bioinformatics, pp. 10-18, 2003.

[13]	(1, 2) : X.-Y. Liu, J. Wu and Z.-H. Zhou, “Exploratory undersampling for class-imbalance learning,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 39(2), pp. 539-550, 2009.

[14]	(1, 2) : I. Tomek, “An experiment with the edited nearest-neighbor rule,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 6(6), pp. 448-452, 1976.

[15]	: H. He, Y. Bai, E. A. Garcia, S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” In Proceedings of the 5th IEEE International Joint Conference on Neural Networks, pp. 1322-1328, 2008.

[16]	: C. Chao, A. Liaw, and L. Breiman. “Using random forest to learn imbalanced data.” University of California, Berkeley 110 (2004): 1-12.

欢迎加入QQ群-->： 979659372

imbalanced-learn 0.5.0

imbalanced-learn的Python项目详细描述

不平衡学习

文件

安装

依赖关系

测试
安装后，您可以使用pytest运行测试套件：
make coverage

发展

关于

参考文献：

推荐PyPI第三方库

qth-darksk

jupyternotebookgist

gohints

aelf-sdk

alphabox

xyz

jupyterhub-jwtauthenticator

blotter

bh100

tc3omega

vapour_linux_amd64

preptools

tc-as-a-service

FastProject

timeliterals

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

imbalanced-learn 0.5.0

imbalanced-learn的Python项目详细描述

不平衡学习

文件

安装

依赖关系

测试 安装后，您可以使用pytest运行测试套件：make coverage

发展

关于

参考文献：

推荐PyPI第三方库

qth-darksk

jupyternotebookgist

gohints

aelf-sdk

alphabox

xyz

jupyterhub-jwtauthenticator

blotter

bh100

tc3omega

vapour_linux_amd64

preptools

tc-as-a-service

FastProject

timeliterals

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

测试
安装后，您可以使用pytest运行测试套件：
make coverage

导航栏

项目链接

标签