python中的主动学习

alip的Python项目详细描述


ALiPy: Active Learning in Python

Authors: Ying-Peng Tang, Guo-Xiang Li, Sheng-Jun Huang

Online document: http://parnec.nuaa.edu.cn/huangsj/alipy/

Offline document: http://parnec.nuaa.edu.cn/huangsj/alipy/offline_ver/alipy_doc_v120.zip

Introduction

ALiPy是一个基于Python实现的主动学习工具包,内置20余种主动学习算法,并提供包括数据处理、结果可视化等工具。ALiPy根据主动学习框架的不同部件提供了若干独立的工具类,这样一方面可以方便地支持不同主动学习场景,另一方面可以使用户自由地组织自己的项目,用户可以不必继承任何接口来实现自己的算法与替换项目中的部件。此外,ALiPy不仅支持多种不同的主动学习场景,如标注代价敏感,噪声标注者,多标记查询等。详细的介绍与文档请参考工具包的官方网站

ALiPy provides a module based implementation of active learning framework, which allows users to conveniently evaluate, compare and analyze the performance of active learning methods. It implementations more than 20 algorithms and also supports users to easily implement their own approaches under different settings.

Features of alipy include:

  • Model independent

    • There is no limitation to the classification model. One can use SVM in sklearn or deep model in tensorflow as you need.
  • Module independent

    • One can freely modify one or more modules of the toolbox without affection to the others.
  • Implement your own algorithm without inheriting anything

    • There are few limitations of the user-defined functions, such as the parameters or names.
  • Variant Settings supported

    • Noisy oracles, Multi-label, Cost effective, Feature querying, etc.
  • Powerful tools

    • Save intermediate results of each iteration AND recover the program from any breakpoints.
    • Parallel the k-folds experiment.
    • Gathering, process and visualize the experiment results.
    • Provide 25 algorithms.
    • Support 7 different settings.

For more detailed introduction and tutorial, please refer to the website of alipy.

Setup

You can get alipy simply by:

α
pip install alipy

Or clone alipy source code to your local directory and build from source:

α
cd ALiPy
python setup.py install

The dependencies of alipy are:

  1. Python dependency
α
python >= 3.4
  1. Basic Dependencies
α
numpy
scipy
scikit-learn
matplotlib
prettytable
  1. Optional dependencies
α
cvxpy

Note that, the basic dependencies must be installed, and the optional dependencies are required only if users need to involke KDD'13 BMDR and AAAI'19 SPAL methods in alipy. (cvxpy will not be installed through pip install alipy.)

Tools in alipy

The tool classes provided by alipy cover as many components in active learning as possible. It aims to support experiment implementation with miscellaneous tool functions. These tools are designed in a low coupling way in order to let users to program the experiment project at their own customs.

  • Using alipy.data_manipulate to preprocess and split your data sets for experiments.

  • Using alipy.query_strategy to invoke traditional and state-of-the-art methods.

  • Using alipy.index.IndexCollection to manage your labeled indexes and unlabeled indexes.

  • Using alipy.metric to calculate your model performances.

  • Using alipy.experiment.state and alipy.experiment.state_io to save the intermediate results after each query and recover the program from the breakpoints.

  • Using alipy.experiment.stopping_criteria to get some example stopping criteria.

  • Using alipy.实验。实验分析器收集、处理和可视化实验结果。

  • 使用alipy.oracle实现干净、噪音大、成本敏感的oracle。

  • 使用alipy.utils.multi_thread来并行k-fold实验。

实现的查询策略

alipy目前提供了几种常用的策略,新算法将在后续更新中继续添加。

  • 实例选择:不确定性(SIGIR 1994)、图形密度(CVPR 2012)、QUIRE(TPAMI 2014)、SPAL(AAAI 2019)、委员会查询(ICML 1998)、随机、BMDR(KDD 2013)、LAL(NIPS 2017)、预期误差减少(ICML 2001)

  • 多标签数据:奥迪(ICDM 2013)、奎尔(TPAMI 2014)、兰登(Random)、MMC(KDD 2009)、自适应(IJCAI 2013)

  • 查询功能:afasmc(kdd 2018)、stability(icdm 2013)、random

  • 不同成本的铝:HALC(IJCAI 2018),随机,成本绩效

  • 带嘈杂神谕的al:ceal(ijcai 2017),iethresh(kdd 2009),all,随机

  • 具有新查询类型的al:auro(ijcai 2015)

  • 大规模任务的AL:子采样

实现自己的算法

在alipy中,您的实现没有限制。您只需确保返回的选定索引是未标记索引的子集。

select_ind = my_query(unlab_ind, **my_parameters)
assert set(select_ind) < set(unlab_ind)

用法

使用alipy有两种方法。对于高级封装,可以使用alipy.experiment.alexperiment类。注意,Alexperiment只支持最常用的场景-查询实例的所有标签。这个类只能用几行代码运行实验。您只需要指定各种选项,查询过程将在多线程中运行。注意,如果您想用这个类实现自己的算法,必须满足一些约束条件,请参阅这个类的API参考。

from sklearn.datasets import load_iris
from alipy.experiment.al_experiment import AlExperiment

X, y = load_iris(return_X_y=True)
al = AlExperiment(X, y, stopping_criteria='num_of_queries', stopping_value=50,)
al.split_AL()
al.set_query_strategy(strategy="QueryInstanceUncertainty", measure='least_confident')
al.set_performance_metric('accuracy_score')
al.start_query(multi_thread=True)
al.plot_learning_curve()

要定制自己的主动学习实验,建议按照alipy主页上的alipy/examples和教程中提供的示例,根据您的使用情况选择工具。这样一来,一方面,程序的逻辑对您来说是绝对清晰的,因此易于调试。另一方面,主动学习过程中的某些部分可以由您自己的实现替换为特殊用法。

import copy
from sklearn.datasets import load_iris
from alipy import ToolBox

X, y = load_iris(return_X_y=True)
alibox = ToolBox(X=X, y=y, query_type='AllLabels', saving_path='.')

# Split data
alibox.split_AL(test_ratio=0.3, initial_label_rate=0.1, split_count=10)

# Use the default Logistic Regression classifier
model = alibox.get_default_model()

# The cost budget is 50 times querying
stopping_criterion = alibox.get_stopping_criterion('num_of_queries', 50)

# Use pre-defined strategy
QBCStrategy = alibox.get_query_strategy(strategy_name='QueryInstanceQBC')
QBC_result = []

for round in range(10):
    # Get the data split of one fold experiment
    train_idx, test_idx, label_ind, unlab_ind = alibox.get_split(round)
    # Get intermediate results saver for one fold experiment
    saver = alibox.get_stateio(round)

    while not stopping_criterion.is_stop():
        # Select a subset of Uind according to the query strategy
        # Passing model=None to use the default model for evaluating the committees' disagreement
        select_ind = QBCStrategy.select(label_ind, unlab_ind, model=None, batch_size=1)
        label_ind.update(select_ind)
        unlab_ind.difference_update(select_ind)

        # Update model and calc performance according to the model you are using
        model.fit(X=X[label_ind.index, :], y=y[label_ind.index])
        pred = model.predict(X[test_idx, :])
        accuracy = alibox.calc_performance_metric(y_true=y[test_idx],
                                                  y_pred=pred,
                                                  performance_metric='accuracy_score')

        # Save intermediate results to file
        st = alibox.State(select_index=select_ind, performance=accuracy)
        saver.add_state(st)
        saver.save()

        # Passing the current progress to stopping criterion object
        stopping_criterion.update_information(saver)
    # Reset the progress in stopping criterion object
    stopping_criterion.reset()
    QBC_result.append(copy.deepcopy(saver))

analyser = alibox.get_experiment_analyser(x_axis='num_of_queries')
analyser.add_method(method_name='QBC', method_results=QBC_result)
print(analyser)
analyser.plot_learning_curves(title='Example of AL', std_area=True)

引文

请引用我们的工作:

Tang, Y.-P.; Li, G.-X.; and Huang, S.-J. 2019. ALiPy: Active learning in python. 
Technical report, Nanjing University of Aeronautics and Astronautics. 
available as arXiv preprint https://arxiv.org/abs/1901.03802.
pip install alipy
0

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java在一个问题被连续正确回答三次/并添加差异后,我如何将程序循环回开始   Java中未实例化的匿名类   java如何在Android中录制视频,只允许横向模式和最长时间录制时间   java从另一个活动发送实时消息   多线程java线程和互斥   java禁用Spring安全日志   JAVA伊奥。StreamCorruptedException:在与子级和父级ProcessBuilder通信时写入子级中的标准输出时,流头无效   使用Java(HttpURLConnection)对Restheart进行身份验证(对于Mongodb)   java如何解决Jenkins中的SAXParseException?   java为什么我需要mockito来测试Spring应用程序?   计算sin-cos和tan时缺乏精度(java)   java Hibernate。不同项目中相同一对一映射的不同行为   java图像滑块:如何使用JavaFX将图像放在另一个图像上   java Mockito在使用when时抛出NotAMockException   http Java servlet发送回响应