Python虚拟环境[一些Python库是从Virtual environment目录安装的]

2024-04-19 18:53:31 发布

您现在位置:Python中文网/ 问答频道 /正文

场景: 我们正在创建虚拟环境并安装所有需求.txt但在目录外创建的文件很少。你知道吗

用例: 我们希望压缩这个环境,并希望将其用于spark驱动程序和执行器

问题: 由于虚拟环境目录外安装的文件很少,因此spark出现故障,模块未找到异常或lib*。因此文件不可用。你知道吗


Tags: 模块文件目录txt环境lib虚拟环境驱动程序
1条回答
网友
1楼 · 发布于 2024-04-19 18:53:31

为了解决这个问题,我采取了以下步骤:

写博客: https://kshitij-kuls.com/2019/08/04/setting-up-virtual-environment-for-pyspark/

在继续之前,我们需要了解python的基本结构:

├── bin
│   ├── activate
│   ├── activate.csh
│   ├── activate.fish
│   ├── activate_this.py
│   ├── easy_install
│   ├── easy_install-3.6
│   ├── pip
│   ├── pip3
│   ├── pip3.6
│   ├── python
│   ├── python-config
│   ├── python3 -> python
│   ├── python3.6 -> python
│   └── wheel
├── include
│   └── python3.6m -> /usr/include/python3.6m
├── lib
│   └── python3.6
|       ├── site-packages
│       ├── lib-dynload -> /usr/lib/python3.6/lib-dynload [Dynamic Library]

环境变量:

PYSPARK_PYTHON : Points to the executable python file: bin/python

LD_LIBRARY_PATH : Points to the dynamic library path: lib/python3.6/lib-dynload [All .so* files]

PYTHONPATH:指向虚拟环境中安装的包以及动态库路径:lib/python3.6/site-packages<CPS>lib/python3.6/lib-dynload [All .py files]

PYTHONHOME:指向python库路径:lib/python3.6/site-packages

构建虚拟环境的步骤:

Install python in the machine of desired version.
Create Virtual Env
virtualenv env -p /usr/local/bin/python3
Activate Virtual Env
source env/bin/activate
Install requirements
pip install numpy

这是诀窍,你可以看到 线 ├── lib-dynload -> /usr/lib/python3.6/lib-dynload 它是一个符号链接,指向本地机器路径,因此即使您只是压缩这个虚拟环境文件夹,集群上也会缺少这些依赖项。 所以,需要从/usr/lib/python3.6/lib-dynload/usr/lib64/*.so.*等复制所有的.So*文件。。。到lib/python3.6/lib-dynload/usr/lib/python3.6/lib-dynload/usr/lib64/*.so.*等复制所有.py文件。。。到lib/python3.6/site-packages。 从虚拟环境的主目录运行它在我们的例子中是env/

Prepare zip
zip -rq ../venv.zip *
Upload the zip to the /udf folder for tdss: /tookitaki/tdss/udf/

环境变量设置

对于驱动程序:spark.yarn.appMasterEnv.[Environment variable]

对于执行者:spark.executorEnv.[Environment variable]

pyuPython

pyspark.spark.yarn.appMasterEnv.PYSPARK_PYTHON = venv/bin/pythonpyspark.spark.executorEnv.PYSPARK_PYTHON = venv/bin/python

Python窝

pyspark.spark.yarn.appMasterEnv.PYTHONHOME = venv/lib64/python3.6/site-packagespyspark.spark.executorEnv.PYTHONHOME = venv/lib64/python3.6/site-packages

LD\库\路径

pyspark.spark.yarn.appMasterEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynloadpyspark.spark.executorEnv.LD_LIBRARY_PATH = venv/lib64/python3.6/lib-dynload

Python

这需要包含在YARN-ENV-ENTRIES中,它不是从spark配置中设置的。你知道吗

PYTHONPATH = {{PWD}}/__venv__.zip<CPS>{{PWD}}/__py4j-0.10.7-src__.zip<CPS>venv/lib64/python3.6/site-packages<CPS>venv/lib64/python3.6/lib-dynload<CPS>

To run pythoncd venv

export PYTHONPATH=lib64/python3.6/site-packages:lib64/python3.6/lib-dynload/

export LD_LIBRARY_PATH=lib64/python3.6/lib-dynload

货源仓/激活

相关问题 更多 >