Python datarobot-batch-scoring包_程序模块 - PyPI

通过datarobot的预测api对csv文件进行评分的脚本

datarobot-batch-scoring的Python项目详细描述

通过DataRobot的预测API对CSV文件进行评分的脚本。

https://coveralls.io/repos/github/datarobot/batch-scoring/badge.svg?branch=master

https://travis-ci.com/datarobot/batch-scoring.svg?branch=master

https://badge.fury.io/py/datarobot-batch-scoring.svg

版本兼容性

我们的目标是在每次发布批处理评分时支持尽可能多的datarobot版本，但有时后端中的更改会导致不兼容。此图表与版本保持同步此工具与DataRobot版本之间的兼容性。如果您不确定datarobot的哪个版本正在使用，请联系DataRobot支持以获取帮助。

batch_scoring_version	DataRobot Version
<=1.10	2.7, 2.8, 2.9
>=1.11, <1.13	3.0, 3.1+
>=1.13	2.7, 2.8, 2.9, 3.0, 3.1+

命令batch_scoring_deployment_aware仅适用于新的datarobot 版本。

batch_scoring_deployment_aware	DataRobot Version
>=1.14	4.4+

如何安装

安装或升级到最新版本：

$ pip install -U datarobot_batch_scoring

如何安装特定版本：

$ pip install datarobot_batch_scoring==x.y.z

替代安装

我们在releases页面上发布了两种可选的安装方法。这些是针对Internet受限或Python不可用的情况。

offlinebundle:	For performing installations in environments where Python2.7 or Python3+ is available, but there is no access to the internet. Does not require administrative privileges or pip. Works on Linux, OSX or Windows. These files have “offlinebundle” in their name on the release page.
PyInstaller:	Using pyinstaller we build a single-file-executable that does not depend on Python. It only depends on libc and can be installed without administrative privileges. Right now we publish builds that work for most Linux distros made since Centos5. OSX and Windows are also supported. These files have “executables” in their name on the release page.

offlinebundle:

For performing installations in environments where Python2.7 or Python3+ is available, but there is no access to the internet. Does not require administrative privileges or pip. Works on Linux, OSX or Windows.

These files have “offlinebundle” in their name on the release page.

PyInstaller:

Using pyinstaller we build a single-file-executable that does not depend on Python. It only depends on libc and can be installed without administrative privileges. Right now we publish builds that work for most Linux distros made since Centos5. OSX and Windows are also supported.

These files have “executables” in their name on the release page.

功能

并发请求（--n_concurrent）
暂停/继续
gzip支持
自定义分隔符
并行处理

运行批处理评分、批处理评分脚本或批处理评分部署

您可以执行batch_scoring、batch_scoring_sse或batch_scoring_deployment_aware 命令行中的命令和相关参数，或者可以将参数从.ini文件传递给脚本。将.ini文件放在主目录或运行batch_scoring的目录中， batch_scoring_sse或batch_scoring_deployment_aware命令。使用下面的语法和参数定义参数。请注意，如果运行脚本并通过命令行执行，则命令行参数优先。

下表描述了语法约定；运行脚本的语法遵循下表。 datarobot提供两个脚本，每个脚本用于不同的应用程序。使用：

batch_scoring在专用预测实例上得分。
batch_scoring_sse在独立预测实例上得分。如果不确定实例类型，请联系DataRobot Support。
batch_scoring_deployment_aware使用deployment_id而不是project_id和model_id在专用预测实例上评分。

Convention	Meaning
[ ]	Optional argument
< >	User supplied value
{ \| }	Required, mutually exclusive

必需参数：

batch_scoring --host=<host>--user=<user> <project_id> <model_id> <dataset_filepath> --datarobot_key=<datarobot_key>{--password=<pwd> | --api_token=<api_token>}

batch_scoring_deployment_aware --host=<host>--user=<user> <deployment_id> <dataset_filepath> --datarobot_key=<datarobot_key>{--password=<pwd> | --api_token=<api_token>}

batch_scoring_sse --host=<host> <import_id> <dataset_filepath>

其他推荐参数：

[--verbose][--keep_cols=<keep_cols>][--n_concurrent=<n_concurrent>]

其他可选参数：

[--out=<filepath>][--api_version=<api_version>][--pred_name=<string>][--timeout=<timeout>][—-create_api_token][--n_retry=<n_retry>][--delimiter=<delimiter>][--resume][--no-resume][--skip_row_id][--output_delimiter=<delimiter>]

参数说明：下表介绍了每个参数：

Argument	Standalone	Dedicated	Description
host=<host>	+	+	Specifies the hostname of the prediction API endpoint (the location of the data to use for predictions).
user=<user>	-	+	Specifies the username used to acquire the API token. Use quotes if the name contains spaces.
<import_id>	+	-	Specifies the unique ID for the imported model. If unknown, ask your prediction administrator (the person responsible for the import procedure).
<project_id>	-	+	Specifies the project identification string. You can find the ID embedded in the URL that displays when you are in the Leaderboard (for example, https://<host>/projects/<project_id>/models). Alternatively, when the prediction API is enabled, the project ID displays in the example shown when you click Deploy Model for a specific model in the Leaderboard.
<model_id>	-	+	Specifies the model identification string. You can find the ID embedded in the URL that displays when you are in the Leaderboard and have selected a model (for example, https://<host>/projects/<project_id>/models/<model_id>). Alternatively, when the prediction API is enabled, the model ID displays in the example shown when you click Deploy Model for a specific model in the Leaderboard.
<deployment_id>	-	+	Specifies the unique ID for deployed model, can be used instead of ^{tt20}$ and ^{tt21}$ pair.
<dataset_filepath>	+	+	Specifies the .csv input file that the script scores. DataRobot scores models by submitting prediction requests against ^{tt22}$ using project ^{tt20}$ and model ^{tt21}$.
datarobot_key=<datarobot_key>	-	+	An additional datarobot_key for dedicated prediction instances. This argument is required when using on-demand workers on the Cloud platform, but not for Enterprise users.
password=<pwd>	-	+	Specifies the password used to acquire the API token. Use quotes if the password contains spaces. You must specify either the password or the API token argument. To avoid entering your password each time you run the script, use the ^{tt25}$ argument instead.
api_token=<api_token>	-	+	Specifies the API token for requests; if you do not have a token, you must specify the password argument. You can retrieve your token from your profile on the My Account page.
api_version=<api_version>	+	+	Specifies the API version for requests. If omitted, defaults to current latest. Override this if your DataRobot distribution doesn’t support the latest API version. Valid options are ^{tt26}$ and ^{tt27}$; ^{tt26}$ is the default.
out=<filepath>	+	+	Specifies the file name, and optionally path, to which the results are written. If not specified, the default file name is ^{tt29}$, written to the directory containing the script. The value of the output file must be a single .csv file that can be gzipped (extension .gz).
verbose	+	+	Provides status updates while the script is running. It is recommended that you include this argument to track script execution progress. Silent mode (non-verbose), the default, displays very little output.
keep_cols=<keep_cols>	+	+	Specifies the column names to append to the predictions. Enter as a comma-separated list.
max_prediction_explanations=<num>	+	+	Specifies the number of the top prediction explanations to generate for each prediction. If not specified, the default is ^{tt30}$. Compatible only with api_version ^{tt26}$.
n_samples=<n_samples>	+	+	Specifies the number of samples (rows) to use per batch. If not defined, the ^{tt32}$ option is used.
n_concurrent=<n_concurrent>	+	+	Specifies the number of concurrent requests to submit. By default, the script submits four concurrent requests. Set ^{tt33}$ to match the number of cores in the prediction API endpoint.
create_api_token	+	+	Requests a new API token. To use this option, you must specify the ^{tt34}$ argument for this request (not the ^{tt25}$ argument). Specifying this argument invalidates your existing API token and creates and stores a new token for future prediction requests.
n_retry=<n_retry>	+	+	Specifies the number of times DataRobot will retry if a request fails. A value of -1, the default, specifies an infinite number of retries.
pred_name=<pred_name>	+	+	Applies a name to the prediction column of the output file. If you do not supply the argument, the column name is blank. For binary predictions, only positive class columns are included in the output. The last class (in lexical order) is used as the name of the prediction column.
skip_row_id	+	+	Skip the row_id column in output.
output_delimiter=<delimiter>	+	+	Specifies the delimiter for the output CSV file. The special keyword “tab” can be used to indicate a tab-delimited CSV.
timeout=<timeout>	+	+	The time, in seconds, that DataRobot tries to make a connection to satisfy a prediction request. When the timeout expires, the client (the batch_scoring or batch_scoring_sse command) closes the connection and retries, up to the number of times defined by the value of ^{tt36}$. The default value is 30 seconds.
delimiter=<delimiter>	+	+	Specifies the delimiter to recognize in the input .csv file (e.g., “–delimiter=,”). If not specified, the script tries to automatically determine the delimiter. The special keyword “tab” can be used to indicate a tab-delimited CSV.
resume	+	+	Starts the prediction from the point at which it was halted. If the prediction stopped, for example due to error or network connection issue, you can run the same command with all the same arguments plus this ^{tt37}$ argument. If you do not include this argument, and the script detects a previous script was interrupted mid-execution, DataRobot prompts whether to resume. When resuming a script, you cannot change the ^{tt38}$, ^{tt14}$, ^{tt13}$, ^{tt41}$, or ^{tt42}$.
no-resume	+	+	Starts the prediction from scratch disregarding previous run.
help	+	+	Shows usage help for the command.
fast	+	+	Experimental: Enables a faster .csv processor. Note that this method does not support multiline CSV files.
stdout	+	+	Sends all log messages to stdout. If not specified, the command sends log messages to the ^{tt43}$ file.
auto_sample	+	+	Override the ^{tt44}$ value and instead uses chunks of roughly 2.5 MB to improve throughput. Enabled by default.
encoding	+	+	Specifies dataset encoding. If not provided, the batch_scoring or batch_scoring_sse script attempts to detect the decoding (e.g., “utf-8”, “latin-1”, or “iso2022_jp”). See the Python standard encodings for a list of valid values.
skip_dialect	+	+	Specifies that the script skips CSV dialect detection and uses default “excel” dialect for CSV parsing. By default, the scripts do detect CSV dialect for proper batch generation on the client side.
ca_bundle=<ca_bundle>	+	+	Specifies the path to a CA_BUNDLE file or directory with certificates of trusted Certificate Authorities (CAs) to be used for SSL verification. Note: if passed a path to a directory, the directory must have been processed using the c_rehash utility supplied with OpenSSL.
no_verify_ssl	+	+	Disable SSL verification.

示例：

batch_scoring --host=https://mycorp.orm.datarobot.com/ --user="greg@mycorp.com" --out=pred.csv 5545eb20b4912911244d4835 5545eb71b4912911244d4847 /home/greg/Downloads/diabetes_test.csv
batch_scoring_sse --host=https://mycorp.orm.datarobot.com/ --out=pred.csv 0ec5bcea7f0f45918fa88257bfe42c09 /home/greg/Downloads/diabetes_test.csv
batch_scoring_deployment_aware --host=https://mycorp.orm.datarobot.com/ --user="greg@mycorp.com" --out=pred.csv 5545eb71b4912911244d4848 /home/greg/Downloads/diabetes_test.csv

使用配置文件

> cTIT> BATCHYRITION 命令，检查在运行脚本的目录（工作目录）中是否存在BATCHYSCORGIN .IN文件，如果在工作目录中找不到，则在$home /BATCHI SCORGIN .IN（您的主目录）中。如果这个文件存在，命令使用与上面描述的相同的参数。如果文件不存在，则命令使用命令行参数正常运行。命令行参数的优先级高于文件参数（即，可以使用命令行重写文件参数）。

批评分文件的格式如下：

[batch_scoring]
host=file_host
project_id=file_project_id
model_id=file_model_id
user=file_username
password=file_password

使用说明

如果脚本检测到前一个脚本在执行过程中被中断，它将提示是否继续执行。
如果没有检测到中断脚本，或者如果您指示不恢复先前的执行，脚本将检查指定的输出文件是否存在。如果是，脚本将在覆盖此文件之前提示确认。
每次运行batch_scoring和batch_scoring_sse的日志都存储在当前工作目录中。所有用户都会看到一个datarobot_batch_scoring_main.log日志文件。windows用户可以看到另外两个日志文件，datarobot_batch_scoring_batcher.log和datarobot_batch_scoring_writer.log。
如果在评分数据。这个问题是由于标准python csv解析器的限制造成的。若要解决此问题，请将索引列添加到数据集-评分时将忽略该列，但将有助于分析该列。

支持的平台

datarobot_batch_scoring在Linux、Windows和OS X上进行测试。支持Python2.7.x和Python3.x。

代理支持

批量评分脚本处理standarthttp\u proxy，https\u proxy，no\u proxy环境变量：

export HTTP_PROXY=http://192.168.1.3:3128
export HTTPS_PROXY=http://192.168.1.3:3128
export NO_PROXY=noproxy.domain.com

欢迎加入QQ群-->： 979659372

datarobot-batch-scoring 1.16.4

datarobot-batch-scoring的Python项目详细描述

版本兼容性

如何安装

替代安装

功能

运行批处理评分、批处理评分脚本或批处理评分部署

使用配置文件

使用说明

支持的平台

推荐的python版本

代理支持

推荐PyPI第三方库

pyephys

whale-back-bone

k8s-setup

NlpToolkit-PropBank-C

tld-parser

remapSDK

fleet-x

elytica-dss

PyLooper

huscy.data-request

pyformation

blurryface

treewalk

distributions-bk

gradient-compression

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

datarobot-batch-scoring 1.16.4

datarobot-batch-scoring的Python项目详细描述

版本兼容性

如何安装

替代安装

功能

运行批处理评分、批处理评分脚本或批处理评分部署

使用配置文件

使用说明

支持的平台

推荐的python版本

代理支持

推荐PyPI第三方库

pyephys

whale-back-bone

k8s-setup

NlpToolkit-PropBank-C

tld-parser

remapSDK

fleet-x

elytica-dss

PyLooper

huscy.data-request

pyformation

blurryface

treewalk

distributions-bk

gradient-compression

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签