Python arbok包_程序模块 - PyPI

提供tpot、auto-sklearn和openml之间兼容层的包装工具箱

arbok的Python项目详细描述

arbok（automl wrapper工具box用于openml c兼容性）为tpot和auto sklearn提供包装，作为这些工具与openml之间的兼容层。

包装器扩展了sklearn的BaseSearchCV，并提供了 openml需要的内部参数，例如cv_results_， best_index_、best_params_、best_score_和classes_。

安装

pip install arbok

简单示例

importopenmlfromarbokimportAutoSklearnWrapper,TPOTWrappertask=openml.tasks.get_task(31)dataset=task.get_dataset()# Get the AutoSklearn wrapper and pass parameters like you would to AutoSklearnclf=AutoSklearnWrapper(time_left_for_this_task=3600,per_run_time_limit=360)# Or get the TPOT wrapper and pass parameters like you would to TPOTclf=TPOTWrapper(generations=100,population_size=100,verbosity=2)# Execute the taskrun=openml.runs.run_model_on_task(task,clf)run.publish()print('URL for run: %s/run/%d'%(openml.config.server,run.run_id))

预处理数据

为了使包装器更加健壮，我们需要对数据进行预处理。我们可以填充缺少的值，然后对分类数据进行一次热编码。

首先，我们得到一个掩码，它告诉我们一个特性是否是一个分类的特征与否。

dataset=task.get_dataset()_,categorical=dataset.get_data(return_categorical_indicator=True)categorical=categorical[:-1]# Remove last index (which is the class)

接下来，我们为预处理设置一个管道。我们正在使用 ConditionalImputer，这是一个能够使用分类（名词性）和数值数据的不同策略。

fromsklearn.pipelineimportmake_pipelinefromsklearn.preprocessingimportOneHotEncoderfromarbokimportConditionalImputerpreprocessor=make_pipeline(ConditionalImputer(categorical_features=categorical,strategy="mean",strategy_nominal="most_frequent"),OneHotEncoder(categorical_features=categorical,handle_unknown="ignore",sparse=False))

最后，我们把所有的东西都放在一个包装袋里。

clf=AutoSklearnWrapper(preprocessor=preprocessor,time_left_for_this_task=3600,per_run_time_limit=360)

限制

目前只实现了分类器。回归是因此不可能。
对于tpot，无法设置config_dict变量，因为导致API出现问题。

基准

安装arbok包包括arbenchcli工具。我们可以生成这样的json文件：

fromarbok.benchimportBenchmarkbench=Benchmark()config_file=bench.create_config_file(# Wrapper parameterswrapper={"refit":True,"verbose":False,"retry_on_error":True},# TPOT parameterstpot={"max_time_mins":6,# Max total time in minutes"max_eval_time_mins":1# Max time per candidate in minutes},# Autosklearn parametersautosklearn={"time_left_for_this_task":360,# Max total time in seconds"per_run_time_limit":60# Max time per candidate in seconds})

然后，我们可以这样调用arbench：

arbench --classifier autosklearn --task-id 31 --config config.json

或者将arbok作为python模块调用：

python -m arbok --classifier autosklearn --task-id 31 --config config.json

在批处理系统上运行基准

要运行大规模基准测试，我们可以创建一个配置文件，如生成作业并将其提交给批处理系统，如下所示。

# We create a benchmark setup where we specify the headers, the interpreter we# want to use, the directory to where we store the jobs (.sh-files), and we give# it the config-file we created earlier.bench=Benchmark(headers="#PBS -lnodes=1:cpu3\n#PBS -lwalltime=1:30:00",python_interpreter="python3",# Path to interpreterroot="/path/to/project/",jobs_dir="jobs",config_file="config.json",log_file="log.json")# Create the config file like we did in the section aboveconfig_file=bench.create_config_file(# Wrapper parameterswrapper={"refit":True,"verbose":False,"retry_on_error":True},# TPOT parameterstpot={"max_time_mins":6,# Max total time in minutes"max_eval_time_mins":1# Max time per candidate in minutes},# Autosklearn parametersautosklearn={"time_left_for_this_task":360,# Max total time in seconds"per_run_time_limit":60# Max time per candidate in seconds})# Next, we load the tasks we want to benchmark on from OpenML.# In this case, we load a list of task id's from study 99.tasks=openml.study.get_study(99).tasks# Next, we create jobs for both tpot and autosklearn.bench.create_jobs(tasks,classifiers=["tpot","autosklearn"])# And finally, we submit the jobs using qsubbench.submit_jobs()

预处理参数

fromarbokimportParamPreprocessorimportnumpyasnpfromsklearn.feature_selectionimportVarianceThresholdfromsklearn.pipelineimportmake_pipelineX=np.array([[1,2,True,"foo","one"],[1,3,False,"bar","two"],[np.nan,"bar",None,None,"three"],[1,7,0,"zip","four"],[1,9,1,"foo","five"],[1,10,0.1,"zip","six"]],dtype=object)# Manually specify types, or use types="detect" to automatically detect typestypes=["numeric","mixed","bool","nominal","nominal"]pipeline=make_pipeline(ParamPreprocessor(types="detect"),VarianceThreshold())pipeline.fit_transform(X)

输出：

[[-0.4472136  -0.4472136   1.41421356 -0.70710678 -0.4472136  -0.4472136
   2.23606798 -0.4472136  -0.4472136  -0.4472136   0.4472136  -0.4472136
  -0.85226648  1.        ]
 [-0.4472136   2.23606798 -0.70710678 -0.70710678 -0.4472136  -0.4472136
  -0.4472136  -0.4472136  -0.4472136   2.23606798  0.4472136  -0.4472136
  -0.5831297  -1.        ]
 [ 2.23606798 -0.4472136  -0.70710678 -0.70710678 -0.4472136  -0.4472136
  -0.4472136  -0.4472136   2.23606798 -0.4472136  -2.23606798  2.23606798
  -1.39054004 -1.        ]
 [-0.4472136  -0.4472136  -0.70710678  1.41421356 -0.4472136   2.23606798
  -0.4472136  -0.4472136  -0.4472136  -0.4472136   0.4472136  -0.4472136
   0.49341743 -1.        ]
 [-0.4472136  -0.4472136   1.41421356 -0.70710678  2.23606798 -0.4472136
  -0.4472136  -0.4472136  -0.4472136  -0.4472136   0.4472136  -0.4472136
   1.031691    1.        ]
 [-0.4472136  -0.4472136  -0.70710678  1.41421356 -0.4472136  -0.4472136
  -0.4472136   2.23606798 -0.4472136  -0.4472136   0.4472136  -0.4472136
   1.30082778  1.        ]]

欢迎加入QQ群-->： 979659372

arbok 0.1.21

arbok的Python项目详细描述

安装

简单示例

预处理数据

限制

基准

在批处理系统上运行基准

预处理参数

推荐PyPI第三方库

qiidl

django-blogging

pyimgur

water-api

SiNN

pdf2image

config-client

CoolBMPMover

ivydepparse

pycataloguer

pycloudlib

OMM

env

django-transaction-hooks

odoo10-addon-bi-view-editor

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

arbok 0.1.21

arbok的Python项目详细描述

安装

简单示例

预处理数据

限制

基准

在批处理系统上运行基准

预处理参数

推荐PyPI第三方库

qiidl

django-blogging

pyimgur

water-api

SiNN

pdf2image

config-client

CoolBMPMover

ivydepparse

pycataloguer

pycloudlib

OMM

env

django-transaction-hooks

odoo10-addon-bi-view-editor

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签