方便使用的实用工具,用于数据块笔记本。
databricks-utils的Python项目详细描述
数据块实用程序
databricks-utils
是一个python包,它提供了几个实用程序类/func
这提高了数据块笔记本的易用性。
安装
pip install databricks-utils
功能
文件
api文档可以在https://e2fyi.github.io/databricks-utils/找到。
快速启动
s3bucket
importjsonfromdatabricks_utils.awsimportS3Bucket# need to attach notebook's dbutils# before S3Bucket can be usedS3Bucket.attach_dbutils(dbutils)# create an instance of the s3 bucketbucket=(S3Bucket("somebucketname","SOMEACCESSKEY","SOMESECRETKEY").allow_spark(sc)# local spark context.mount("somebucketname"))# mount location name (resolves as `/mnt/somebucketname`)# show list of files/folders in the bucket "resource" folderbucket.ls("resource/")# read in a json file from the bucketdata=json.load(open(bucket.local("resource/somefile.json","r")))# read from parquet via sparkdataframe=spark.read.parquet(bucket.s3("resource/somedf.parquet"))# umountbucket.umount()
vega
Vega和Vega-Lite
是交互式图形的高级语法。它们提供简洁的json
用于快速生成可视化以支持分析的语法。
fromdatabricks_utils.vegaimportvega_embed# vega-lite spec for a bar chartspec={"data":{"values":[{"a":"A","b":28},{"a":"B","b":55},{"a":"C","b":43},{"a":"D","b":91},{"a":"E","b":81},{"a":"F","b":53},{"a":"G","b":19},{"a":"H","b":87},{"a":"I","b":52}]},"mark":"bar","encoding":{"x":{"field":"a","type":"ordinal"},"y":{"field":"b","type":"quantitative"}}}# plot out the vega chart in databricks notebookdisplayHTML(vega_embed(spec=spec))
显影剂
# add a version to git tag and publish to pypi
. add_tag.sh <VERSION>