2024-04-19 18:53:31 发布
网友
场景: 我们正在创建虚拟环境并安装所有需求.txt但在目录外创建的文件很少。你知道吗
用例: 我们希望压缩这个环境,并希望将其用于spark驱动程序和执行器
问题: 由于虚拟环境目录外安装的文件很少,因此spark出现故障,模块未找到异常或lib*。因此文件不可用。你知道吗
为了解决这个问题,我采取了以下步骤:
写博客: https://kshitij-kuls.com/2019/08/04/setting-up-virtual-environment-for-pyspark/
在继续之前,我们需要了解python的基本结构:
├── bin │ ├── activate │ ├── activate.csh │ ├── activate.fish │ ├── activate_this.py │ ├── easy_install │ ├── easy_install-3.6 │ ├── pip │ ├── pip3 │ ├── pip3.6 │ ├── python │ ├── python-config │ ├── python3 -> python │ ├── python3.6 -> python │ └── wheel ├── include │ └── python3.6m -> /usr/include/python3.6m ├── lib │ └── python3.6 | ├── site-packages │ ├── lib-dynload -> /usr/lib/python3.6/lib-dynload [Dynamic Library]
环境变量:
PYSPARK_PYTHON : Points to the executable python file: bin/python
LD_LIBRARY_PATH : Points to the dynamic library path: lib/python3.6/lib-dynload [All .so* files]
PYTHONPATH:指向虚拟环境中安装的包以及动态库路径:lib/python3.6/site-packages<CPS>lib/python3.6/lib-dynload [All .py files]
lib/python3.6/site-packages<CPS>lib/python3.6/lib-dynload [All .py files]
PYTHONHOME:指向python库路径:lib/python3.6/site-packages
构建虚拟环境的步骤:
Install python in the machine of desired version. Create Virtual Env virtualenv env -p /usr/local/bin/python3 Activate Virtual Env source env/bin/activate Install requirements pip install numpy
这是诀窍,你可以看到 线 ├── lib-dynload -> /usr/lib/python3.6/lib-dynload 它是一个符号链接,指向本地机器路径,因此即使您只是压缩这个虚拟环境文件夹,集群上也会缺少这些依赖项。 所以,需要从/usr/lib/python3.6/lib-dynload、/usr/lib64/*.so.*等复制所有的.So*文件。。。到lib/python3.6/lib-dynload 从/usr/lib/python3.6/lib-dynload、/usr/lib64/*.so.*等复制所有.py文件。。。到lib/python3.6/site-packages。 从虚拟环境的主目录运行它在我们的例子中是env/
├── lib-dynload -> /usr/lib/python3.6/lib-dynload
/usr/lib/python3.6/lib-dynload
/usr/lib64/*.so.*
lib/python3.6/lib-dynload
lib/python3.6/site-packages
Prepare zip zip -rq ../venv.zip * Upload the zip to the /udf folder for tdss: /tookitaki/tdss/udf/
环境变量设置
对于驱动程序:spark.yarn.appMasterEnv.[Environment variable]
spark.yarn.appMasterEnv.[Environment variable]
对于执行者:spark.executorEnv.[Environment variable]
spark.executorEnv.[Environment variable]
pyuPython
pyspark.spark.yarn.appMasterEnv.PYSPARK_PYTHON = venv/bin/pythonpyspark.spark.executorEnv.PYSPARK_PYTHON = venv/bin/python
pyspark.spark.yarn.appMasterEnv.PYSPARK_PYTHON = venv/bin/python
pyspark.spark.executorEnv.PYSPARK_PYTHON = venv/bin/python
Python窝
pyspark.spark.yarn.appMasterEnv.PYTHONHOME = venv/lib64/python3.6/site-packagespyspark.spark.executorEnv.PYTHONHOME = venv/lib64/python3.6/site-packages
pyspark.spark.yarn.appMasterEnv.PYTHONHOME = venv/lib64/python3.6/site-packages
pyspark.spark.executorEnv.PYTHONHOME = venv/lib64/python3.6/site-packages
LD\库\路径
pyspark.spark.yarn.appMasterEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynloadpyspark.spark.executorEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload
pyspark.spark.yarn.appMasterEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload
pyspark.spark.executorEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload
Python
这需要包含在YARN-ENV-ENTRIES中,它不是从spark配置中设置的。你知道吗
PYTHONPATH = {{PWD}}/__venv__.zip<CPS>{{PWD}}/__py4j-0.10.7-src__.zip<CPS>venv/lib64/python3.6/site-packages<CPS>venv/lib64/python3.6/lib-dynload<CPS>
To run pythoncd venv
To run python
cd venv
export PYTHONPATH=lib64/python3.6/site-packages:lib64/python3.6/lib-dynload/
export LD_LIBRARY_PATH=lib64/python3.6/lib-dynload
货源仓/激活
为了解决这个问题,我采取了以下步骤:
写博客: https://kshitij-kuls.com/2019/08/04/setting-up-virtual-environment-for-pyspark/
在继续之前,我们需要了解python的基本结构:
环境变量:
PYSPARK_PYTHON : Points to the executable python file: bin/python
LD_LIBRARY_PATH : Points to the dynamic library path: lib/python3.6/lib-dynload [All .so* files]
PYTHONPATH:指向虚拟环境中安装的包以及动态库路径:
lib/python3.6/site-packages<CPS>lib/python3.6/lib-dynload [All .py files]
PYTHONHOME:指向python库路径:lib/python3.6/site-packages
构建虚拟环境的步骤:
这是诀窍,你可以看到 线
├── lib-dynload -> /usr/lib/python3.6/lib-dynload
它是一个符号链接,指向本地机器路径,因此即使您只是压缩这个虚拟环境文件夹,集群上也会缺少这些依赖项。 所以,需要从/usr/lib/python3.6/lib-dynload
、/usr/lib64/*.so.*
等复制所有的.So*文件。。。到lib/python3.6/lib-dynload
从/usr/lib/python3.6/lib-dynload
、/usr/lib64/*.so.*
等复制所有.py文件。。。到lib/python3.6/site-packages
。 从虚拟环境的主目录运行它在我们的例子中是env/环境变量设置
对于驱动程序:
spark.yarn.appMasterEnv.[Environment variable]
对于执行者:
spark.executorEnv.[Environment variable]
pyuPython
pyspark.spark.yarn.appMasterEnv.PYSPARK_PYTHON = venv/bin/python
pyspark.spark.executorEnv.PYSPARK_PYTHON = venv/bin/python
Python窝
pyspark.spark.yarn.appMasterEnv.PYTHONHOME = venv/lib64/python3.6/site-packages
pyspark.spark.executorEnv.PYTHONHOME = venv/lib64/python3.6/site-packages
LD\库\路径
pyspark.spark.yarn.appMasterEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload
pyspark.spark.executorEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload
Python
这需要包含在YARN-ENV-ENTRIES中,它不是从spark配置中设置的。你知道吗
PYTHONPATH = {{PWD}}/__venv__.zip<CPS>{{PWD}}/__py4j-0.10.7-src__.zip<CPS>venv/lib64/python3.6/site-packages<CPS>venv/lib64/python3.6/lib-dynload<CPS>
To run python
cd venv
export PYTHONPATH=lib64/python3.6/site-packages:lib64/python3.6/lib-dynload/
export LD_LIBRARY_PATH=lib64/python3.6/lib-dynload
货源仓/激活
相关问题 更多 >
编程相关推荐