Python mlrun包_程序模块 - PyPI

机器学习运行的跟踪与配置

mlrun的Python项目详细描述

mlrun

一种通用的易于使用的机制，供数据科学家和开发人员/工程师描述和跟踪代码、元数据，机器学习相关任务（执行）的输入和输出。

阅读this doc link中的更多详细信息

一般概念和动机

开发人员或数据科学家在本地ide或笔记本上编写代码，稍后他会喜欢使用扩展容器或函数在更大的集群上运行相同的代码，一旦代码准备好，他可能希望将代码合并到自动化的ml工作流中（例如，使用Kubeflow管道）。

在不同的runtime环境中，他希望使用不同的配置、参数和数据。他还希望记录/版本所有输出和相关输入（沿袭）。集群上的数据工件通常来自远程数据存储，如s3、gcs， ……这意味着传输数据的复杂性和额外的复杂性。

如果我们可以用简单的local语义编写一次代码，并让一些层自动执行数据移动、版本控制、参数替换、输出跟踪等？

这就是这个计划的目标。

该准则尚处于早期开发阶段，我们希望能促进广泛的行业合作。其思想是使所有资源都可插入，这样开发人员就可以编写一个api，并可以使用各种开源项目或商业产品。

架构

用户使用get_or_create_ctx方法实例化context对象，读取或写入元数据、机密、输入，或者输出是通过上下文对象完成的。上下文对象可以本地使用，同时，上下文可以通过api、cli、rpc、环境变量或其他机制injected。

结帐training example。

示例代码

frommlrunimportget_or_create_ctxfrommlrun.artifactsimportChartArtifact,TableArtifactdefmy_job():# load MLRUN runtime context (will be set by the runtime framework e.g. KubeFlow)context=get_or_create_ctx('train')# get parameters from the runtime context (or use defaults)p1=context.get_param('p1',1)p2=context.get_param('p2','a-string')# access input metadata, values, files, and secrets (passwords)print(f'Run: {context.name} (uid={context.uid})')print(f'Params: p1={p1}, p2={p2}')print('accesskey = {}'.format(context.get_secret('ACCESS_KEY')))print('file\n{}\n'.format(context.get_object('infile.txt').get()))# RUN some useful code e.g. ML training, data prep, etc.# log scalar result values (job result metrics)context.log_result('accuracy',p1*2)context.log_result('loss',p1*3)context.set_label('framework','sklearn')# log various types of artifacts (file, web page, table), will be versioned and visible in the UIcontext.log_artifact('model.txt',body=b'abc is 123',labels={'framework':'xgboost'})context.log_artifact('results.html',body=b'<b> Some HTML <b>',viewer='web-app')context.log_artifact(TableArtifact('dataset.csv','1,2,3\n4,5,6\n',viewer='table',header=['A','B','C']))# create a chart output (will show in the pipelines UI)chart=ChartArtifact('chart.html')chart.labels={'type':'roc'}chart.header=['Epoch','Accuracy','Loss']foriinrange(1,8):chart.add_row([i,i/20+0.75,0.30-i/20])context.log_artifact(chart)if__name__=="__main__":my_job()

以内联方式或特定运行时运行函数

用户可以通过run_start库函数调用代码，请参见examples notebook

frommlrunimportrun_startimportyaml# note: you need to create/specify a secrets file with credentials for remote data access (e.g. in S3 or v3io)run_spec={'metadata':{'labels':{'owner':'yaronh'}},'spec':{'parameters':{'p1':5},'input_objects':[],'log_level':'info','secret_sources':[{'kind':'file','source':'secrets.txt'}],}}task=run_start(run_spec,command='example1.py',rundb='./')print(yaml.dump(task))

用户可以通过run_start命令中的参数选择要使用的运行时（内联代码、子进程、dask、horovod、nuclio）。有关详细信息，请参见examples notebook。

使用超参数

同一代码可以多次运行，每次运行使用不同的参数，这可以通过简单地设置hyperparams属性来完成，例如：

# note: you need to create/specify a secrets file with credentials for remote data access (e.g. in S3 or v3io)run_spec={'metadata':{'labels':{'owner':'yaronh'}},'spec':{'parameters':{'p1':5},'input_objects':[],'log_level':'info','secret_sources':[{'kind':'file','source':'secrets.txt'}],}}hyper={'p2':['aa','bb','cc']}task=run_start(run_spec,command='example1.py',rundb='./',hyperparams=hyper)print(yaml.dump(task))

从cli

替换运行时上下文参数

python -m mlrun run -p p1=5 -s file=secrets.txt -i infile.txt=s3://mybucket/infile.txt training.py

运行上述命令时：

参数p1将被5
文件infile.txt将从远程s3存储桶加载
凭证（用于s3和应用程序）将从secrets.txt文件加载

针对远程代码/功能运行

相同的代码可以实现为远程http端点，例如使用nuclio serverless函数

例如，相同的代码可以包装在nuclio处理程序中，并使用相同的cli进行远程执行

功能代码

frommlrunimportget_or_create_ctximporttimedefhandler(context,event):ctx=get_or_create_ctx('myfunc',event=event)p1=ctx.get_param('p1',1)p2=ctx.get_param('p2','a-string')context.logger.info(f'Run: {ctx.name} uid={ctx.uid}:{ctx.iteration} Params: p1={p1}, p2={p2}')time.sleep(1)# log scalar values (KFP metrics)ctx.log_result('accuracy',p1*2)ctx.log_result('latency',p1*3)# log various types of artifacts (and set UI viewers)ctx.log_artifact('test.txt',body=b'abc is 123')ctx.log_artifact('test.html',body=b'<b> Some HTML <b>',viewer='web-app')context.logger.info('run complete!')returnctx.to_json()

功能部署

要将函数部署到集群中，可以运行以下命令（确保首先安装了nuclio jupyter软件包）

importnucliospec=nuclio.ConfigSpec(cmd=['pip install git+https://github.com/v3io/mlrun.git'],config={'spec.build.baseImage':'python:3.6-jessie','spec.triggers.web':{'kind':'http','maxWorkers':8}})addr=nuclio.deploy_file('mycode.py',name='myfunc',project='mlrun',spec=spec)

Note: add this repo to nuclio build commands (pip install git+https://github.com/v3io/mlrun.git)

要远程执行代码，只需用函数url替换文件名即可

python -m mlrun run -p p1=5 -s file=secrets.txt -i infile.txt=s3://mybucket/infile.txt http://<function-endpoint>

在Kubeflow管道内运行

在管道中运行类似于使用命令行运行 mlrun将以kubeflow可见的方式自动保存输出和工件，并允许互连步骤

请参见pipelines notebook example

# run training using params p1 and p2, generate 2 registered outputs (model, dataset) to be listed in the pipeline UI# user can specify the target path per output e.g. 'model.txt':'<some-path>', or leave blank to use the default out_pathdefmlrun_train(p1,p2):returnmlrun_op('training',command=this_path+'/training.py',params={'p1':p1,'p2':p2},outputs={'model.txt':'','dataset.csv':''},out_path=artifacts_path,rundb=db_path)# use data (model) from the first step as an inputdefmlrun_validate(modelfile):returnmlrun_op('validation',command=this_path+'/validation.py',inputs={'model.txt':modelfile},out_path=artifacts_path,rundb=db_path)

您可以在DAG中使用该函数：

@dsl.pipeline(name='My MLRUN pipeline',description='Shows how to use mlrun.')defmlrun_pipeline(p1=5,p2='"text"'):# create a train step, apply v3io mount to it (will add the /User mount to the container)train=mlrun_train(p1,p2).apply(mount_v3io())# feed 1st step results into the secound step# Note: the '.' in model.txt must be substituted with '-'validate=mlrun_validate(train.outputs['model-txt']).apply(mount_v3io())

查询运行结果和工件数据库

如果指定了rundb，则记录每次运行的结果和工件

您可以使用各种db方法，请参见example notebook

frommlrunimportget_run_db# connect to a local file DBdb=get_run_db('./').connect()# list all runsdb.list_runs('').show()# list all artifact for version "latest"db.list_artifacts('',tag='').show()# check different artifact versions db.list_artifacts('ch',tag='*').show()# delete completed runsdb.del_runs(state='completed')

欢迎加入QQ群-->： 979659372

mlrun 0.1.6

mlrun的Python项目详细描述

mlrun

一般概念和动机

架构

示例代码

以内联方式或特定运行时运行函数

使用超参数

从cli

针对远程代码/功能运行

功能代码

功能部署

在Kubeflow管道内运行

查询运行结果和工件数据库

推荐PyPI第三方库

flexcode

paspas

vortexai

fontMath

yxspkg-data-icon

pypiclip

cutImages

PyKDL

pytrix

vktop

xstaticbootstrapscss

ghostbot

nnstacking

bugle-cms

nbodyswissknife

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

mlrun 0.1.6

mlrun的Python项目详细描述

mlrun

一般概念和动机

架构

示例代码

以内联方式或特定运行时运行函数

使用超参数

从cli

针对远程代码/功能运行

功能代码

功能部署

在Kubeflow管道内运行

查询运行结果和工件数据库

推荐PyPI第三方库

flexcode

paspas

vortexai

fontMath

yxspkg-data-icon

pypiclip

cutImages

PyKDL

pytrix

vktop

xstaticbootstrapscss

ghostbot

nnstacking

bugle-cms

nbodyswissknife

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签