如果在获取SparkContext（）之前打开一个文件，Pyspark将引发Java网关异常

import pyspark from pprint import pprint from pyspark import SparkConf def getproperties(): """Get Spark configuration properties in python dictionary""" global properties properties = dict() with open('myspark_config.properties') as f: for line in f: if not line.startswith('#') and not line.startswith('\n'): tokens = line.split('=') tokens[0] = tokens[0].strip() tokens[1] = "=".join(tokens[1:]) properties[tokens[0]] = tokens[1].strip() f.close() pprint(properties) return(properties) properties = getproperties() conf = (SparkConf() .setMaster(properties["spark_master_url"]) .setAppName("testApp") .set('spark.cores.max',properties["spark_app_cores"]) .set('spark.executor.memory',properties["spark_app_memory"]) .set('spark.dynamicAllocation.enabled','true') .set('spark.shuffle.service.enabled','true') ) # conf = (SparkConf() # .setMaster("spark://remote:port") # .setAppName("testApp") # .set('spark.cores.max',"2") # .set('spark.executor.memory',"2G") # .set('spark.dynamicAllocation.enabled','true') # .set('spark.shuffle.service.enabled','true') # ) sc = pyspark.SparkContext(conf=conf)

1条回答

网友

1楼 · 发布于 2024-04-25 14:38:36

But, I want to read spark_master_ip and spark.cores.max from a .properties file (instead of hard coding it).

这是个很棒的主意，但是你忽略了一个事实，那就是$SPARK_HOME/conf/spark-defaults.conf的作用。只需将所需属性放在那里。在

but I get the following Java gateway exception,

这看起来不对：

"=".join(tokens[1:])

为什么要在属性中使用=？在

否则就没有效果了。Python还提供属性解析器https://docs.python.org/3/library/configparser.html

相关问题更多 >

编程相关推荐

热门问题

热门文章