擅长:python、mysql、java
<p>可以使用<code>$SPARK_HOME/conf/spark-defaults.conf</code>中的<a href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L176" rel="noreferrer">^{<cd1>}</a>(设置<a href="https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala#L174" rel="noreferrer">^{<cd2>}</a>也应该有效)属性传递任何依赖项。它应该是一个逗号分隔的坐标列表。</p>
<p>在启动JVM和<a href="https://github.com/apache/spark/blob/branch-1.6/python/pyspark/conf.py#L104" rel="noreferrer">this happens during ^{<cd4>} initialization</a>之前,必须设置包或类路径属性。这意味着这里不能使用<code>SparkConf.set</code>方法。</p>
<p>另一种方法是在初始化<code>SparkConf</code>对象之前设置<code>PYSPARK_SUBMIT_ARGS</code>环境变量:</p>
<pre><code>import os
from pyspark import SparkConf
SUBMIT_ARGS = "--packages com.databricks:spark-csv_2.11:1.2.0 pyspark-shell"
os.environ["PYSPARK_SUBMIT_ARGS"] = SUBMIT_ARGS
conf = SparkConf()
sc = SparkContext(conf=conf)
</code></pre>