我已通过运行以下命令安装了databricks cli工具
pip install databricks-cli
为Python安装使用适当版本的pip。如果您使用的是Python 3,请运行pip3
然后,通过创建PAT(Databricks中的个人访问令牌),我运行以下.sh bash脚本:
# You can run this on Windows as well, just change to a batch files
# Note: You need the Databricks CLI installed and you need a token configued
#!/bin/bash
echo "Creating DBFS direcrtory"
dbfs mkdirs dbfs:/databricks/packages
echo "Uploading cluster init script"
dbfs cp --overwrite python_dependencies.sh dbfs:/databricks/packages/python_dependencies.sh
echo "Listing DBFS direcrtory"
dbfs ls dbfs:/databricks/packages
python_dependencies.sh脚本
#!/bin/bash
# Restart cluster after running.
sudo apt-get install applicationinsights=0.11.9 -V -y
sudo apt-get install azure-servicebus=0.50.2 -V -y
sudo apt-get install azure-storage-file-datalake=12.0.0 -V -y
sudo apt-get install humanfriendly=8.2 -V -y
sudo apt-get install mlflow=1.8.0 -V -y
sudo apt-get install numpy=1.18.3 -V -y
sudo apt-get install opencensus-ext-azure=1.0.2 -V -y
sudo apt-get install packaging=20.4 -V -y
sudo apt-get install pandas=1.0.3 -V -y
sudo apt update
sudo apt-get install scikit-learn=0.22.2.post1 -V -y
status=$?
echo "The date command exit status : ${status}"
我使用上面的脚本在集群的init脚本中安装python库
我的问题是,尽管一切正常,集群启动成功,但库安装不正确。当我单击集群的libraries选项卡时,我会看到:
感谢您的帮助和评论
我根据@RedCricket的评论找到了解决方案
上面的.sh文件将安装集群启动时引用的所有python依赖项。因此,在重新执行笔记本时,不必重新安装这些库
相关问题 更多 >
编程相关推荐