如何使用AWS SageMaker进行外部XGBoost超参数调整？

2024-05-29 06:38:03 发布

您现在位置：Python中文网/ 问答频道 /正文

933

网友

男 | 程序猿一只，喜欢编程写python代码。

这里没有偏见，但是我发现在AWS文档中很难找到任何东西。微软Azure对我来说容易得多。在

我现在的情况是：

一个完全用Python构建的二进制分类应用程序，xgboost是ML模型。这里xgboost有一组从SageMaker获得的优化超参数。在
用于启动xgboost的超参数调整作业的SageMaker笔记本。然后我在Python应用程序中手动复制并粘贴和超参数到xgboost模型中进行预测。在

正如你所见，我做这件事的方式与理想相去甚远。我现在要做的是在Python应用程序中添加一段代码，在SageMaker中自动启动超参数作业并返回最佳模型。这样，超参数作业是自动化的，我不需要再进行复制和粘贴。在

然而，我还没能做到。我按照这个documentation安装pythonsagemaker API。我还有以下代码可以在SageMaker笔记本中进行XGBoost超参数调整：

 def train_xgb_sagemaker(df_train, df_test):
    pd.concat([df_train['show_status'], df_train.drop(['show_status'], axis=1)], axis=1).to_csv('train.csv',
                                                                                                index=False,
                                                                                                header=False)
    pd.concat([df_test['show_status'], df_test.drop(['show_status'], axis=1)], axis=1).to_csv('validation.csv',
                                                                                              index=False, header=False)

    boto3.Session().resource('s3').Bucket(bucket, prefix).upload_file(
        'train.csv')

    boto3.Session().resource('s3').Bucket(bucket, prefix).upload_file(
        'validation.csv')

    s3_input_train = sagemaker.s3_input(s3_data='s3://{}/{}/train'.format(bucket, prefix), content_type='csv')
    s3_input_validation = sagemaker.s3_input(s3_data='s3://{}/{}/validation/'.format(bucket, prefix), content_type='csv')

    print('train_path: ', s3_input_train)
    print('validation_path: ', s3_input_validation)

    # hyperparameter tuning of XGBoost - SageMaker
    sess = sagemaker.Session()

    container = get_image_uri(region, 'xgboost', 0.90 - 1)
    xgb = sagemaker.estimator.Estimator(container,
                                        role,
                                        train_instance_count=1,
                                        train_instance_type='ml.m4.xlarge',
                                        output_path='s3://{}/{}/output'.format(params['BUCKET'], prefix),
                                        sagemaker_session=sess)

    xgb.set_hyperparameters(eval_metric='auc',
                            objective='binary:logistic',
                            num_round=100,
                            rate_drop=0.3,
                            tweedie_variance_power=1.4)

    hyperparameter_ranges = {'eta': ContinuousParameter(0, 1),
                             'min_child_weight': ContinuousParameter(1, 10),
                             'alpha': ContinuousParameter(0, 2),
                             'max_depth': IntegerParameter(1, 10),
                             'num_round': IntegerParameter(1, 300)}

    objective_metric_name = 'validation:auc'

    tuner = HyperparameterTuner(xgb,
                                objective_metric_name,
                                hyperparameter_ranges,
                                max_jobs=20,
                                max_parallel_jobs=3)

    tuner.fit({'train': s3_input_train, 'validation': s3_input_validation}, include_cls_metadata=False)

    smclient.describe_hyper_parameter_tuning_job(
        HyperParameterTuningJobName=tuner.latest_tuning_job.job_name)['HyperParameterTuningJobStatus']

    print('Please check hyperparameter tuning for best models!')
    time.sleep(4000)
    # best_model_path = 's3://{}/{}/output/{}/output/model.tar.gz'.format(bucket, prefix, tuner.best_training_job())
    return tuner.best_training_job()

所以问题是如何将这段代码嵌入到我的Python应用程序中，这样我就可以在一个地方完成所有的事情了？非常感谢你给我的任何提示，因为我一直在这个问题上徘徊了好几天！在

Tags： csv false df input 参数 prefix s3 bucket

1条回答

网友

1楼 · 发布于 2024-05-29 06:38:03

实际上有一个python SDK调用来部署性能最好的超参数优化作业模型：

tuner.deploy()

查找相关文档here

如何使用AWS SageMaker进行外部XGBoost超参数调整？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用AWS SageMaker进行外部XGBoost超参数调整？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >