我已经创建了一个包,我想传递给每个执行者节点

2024-04-25 16:39:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经创建了一个python包,我正在我的主python文件中使用它,它将使用spark submit在纱线集群上运行。以下是我遵循的步骤

1) Suppose i have package name auditing. auditing has subpackage name abc_pkg_1,abc_pckg_2
2) I have main file test.py where i am using that package
3) I have created egg file for the auditing package using setup.py outside the package.
4) I ran spark-submit with --py-files dist/auditing-0.0.1-py3.6.egg

setup.py(用于鸡蛋文件)

from setuptools import setup, find_packages

setup(
    name="auditing",
    version="0.0.1",
    author="Example Author",
    packages=find_packages()
)

test.py:

from auditing import Driver

纱线日志中出现错误:

ModuleNotFoundError: No module named 'auditing'

创建蛋文件的命令:

python3 setup.py bdist_egg

甚至pyspark shell中的东西也不起作用。找不到模块的相同错误


Tags: 文件namepytestpackageeggpackageshave