将SageMaker和Databricks等云解决方案与Hopsworks集成的SDK。

hopsworks-cloud-sdk的Python项目详细描述


DownloadsPypiStatusPythonVersions

hopsworks cloud sdk是一个将现有云解决方案(如Amazon SageMaker our Databricks)与hopsworks平台集成的sdk。在

它允许从SageMaker和Databricks笔记本访问Hopsworks功能库。在

快速入门

确保Hopsworks安装设置正确:Setting up Hopsworks for the cloud

要安装:

>>> pip install hopsworks-cloud-sdk

示例用法:

^{pr2}$

文件

Hopsworks功能库的

Hopsworks有一个用于机器学习的数据管理层,称为特征存储。 功能库支持简单高效的版本控制、共享、管理和定义特性,这些特性既可用于训练机器学习模型,也可用于服务推理请求。 featurestore是数据工程和数据科学之间的自然接口。在

API documentation

正在从featurestore读取

fromhopsimportfeaturestorefeatures_df=featurestore.get_features(["team_budget","average_attendance","average_player_age"])

与Sci工具包学习集成

fromhopsimportfeaturestoretrain_df=featurestore.get_featuregroup("iris_features",dataframe_type="pandas")x_df=train_df[['sepal_length','sepal_width','petal_length','petal_width']]y_df=train_df[["label"]]X=x_df.valuesy=y_df.values.ravel()iris_knn=KNeighborsClassifier()iris_knn.fit(X,y)

与Tensorflow集成

fromhopsimportfeaturestorefeature_list=["team_budget","average_attendance","average_player_age","team_position","sum_attendance","average_player_rating","average_player_worth","sum_player_age","sum_player_rating","sum_player_worth","sum_position","average_position"]latest_version=featurestore.get_latest_training_dataset_version("team_position_prediction")featurestore.create_training_dataset(features=feature_list,training_dataset="team_position_prediction",descriptive_statistics=False,feature_correlation=False,feature_histograms=False,cluster_analysis=False,training_dataset_version=latest_version+1)defcreate_tf_dataset():dataset_dir=featurestore.get_training_dataset_path("team_position_prediction")input_files=tf.gfile.Glob(dataset_dir+"/part-r-*")dataset=tf.data.TFRecordDataset(input_files)tf_record_schema=...# Add tf schemafeature_names=["team_budget","average_attendance","average_player_age","sum_attendance","average_player_rating","average_player_worth","sum_player_age","sum_player_rating","sum_player_worth","sum_position","average_position"]label_name="team_position"defdecode(example_proto):example=tf.parse_single_example(example_proto,tf_record_schema)x=[]forfeature_nameinfeature_names:x.append(example[feature_name])y=[tf.cast(example[label_name],tf.float32)]returnx,ydataset=dataset.map(decode).shuffle(SHUFFLE_BUFFER_SIZE).batch(BATCH_SIZE).repeat(NUM_EPOCHS)returndatasettf_dataset=create_tf_dataset()

功能可视化

Visualizing feature distributions
Visualizing feature correlations

开发说明书

有关如何测试和生成文档等开发详细信息,请参阅参考文献:Development。在

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java在AlertDialog builder标题右侧放置图标   安装weblogic server12时发生java获取错误。1在windows 10上   java无法导入:安卓。支持v7。小装置。Android Studio中的RecyclerView   java Android应用程序等待通知奇怪行为   java如何比较ArrayList中的整数元素?   java Quartz属性不会触发Quartz作业   java轻松地将许多JavaFX属性绑定到UINode   Maven插件管理器导致java错误消息的原因是什么?   JAXB解组错误后java文件被阻止   java如何在spark kafka流中创建消费者组并将消费者分配给消费者组   java Gps lat&long随机显示非常不准确的结果   java使用assest文件夹文件在Android上执行shell命令   java如何在客户端使用javascript提取文本   java扩展描述的distincts之和   java重写默认Spring数据REST配置   java SQL未命名参数语法   二进制搜索任务的java真实解决方案   java在一个多模块多数据源项目中,用什么正确的方式来指示将哪个数据源注入我的DAOs?