从sql s检索SparkSession

2024-03-28 18:14:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在docker环境上运行spark,需要从在sql server 2017上运行scrips的Pypark连接。你知道吗

exec sp_execute_external_script 
@language =N'Python',
@script=N'from pyspark.sql.session import SparkSession
spark = SparkSession \
    .builder \
    .appName("Python Spark SQL data source example") \
    .master("local[1]") \
    .getOrCreate()'

我还正确地在本地设置了javau HOME

但这给了我以下的错误

An external script error occurred: 
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS2017\PYTHON_SERVICES\lib\site-packages\pyspark\context.py", line 133, in __init__
    SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS2017\PYTHON_SERVICES\lib\site-packages\pyspark\context.py", line 316, in _ensure_initialized
    SparkContext._gateway = gateway or launch_gateway(conf)
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS2017\PYTHON_SERVICES\lib\site-packages\pyspark\java_gateway.py", line 46, in launch_gateway
    return _launch_gateway(conf)
  File "C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS2017\PYTHON_SERVICES\lib\site-packages\pyspark\java_gateway.py", line 108, in _launch_gateway
    raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

检查输出以获取更多信息。你知道吗

缺少Python可执行文件'python',对于SPARK_HOME环境变量,默认为'C:\Program Files\Microsoft SQL Server\MSSQL14.SQLEXPRESS2017\PYTHON_SERVICES\Lib\site-packages\pyspark\bin\..'。请安装Python或在PYSPARK_DRIVER_PYTHONPYSPARK_PYTHON环境变量中指定正确的Python可执行文件以安全地检测SPARK_HOME。 找不到Spark jars目录。 你需要在运行这个程序之前建立Spark。你知道吗


Tags: pysqlserverlibpackageslinesitefiles