Python Apache spark连接到redshi

2024-04-25 07:00:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我在这里通过这个网址将pyspark连接到redshift:

Spark Redshift with Python

我创建了一个文件夹,下载了

RedshiftJDBC42-1.2.12.1017.jar 并创建了Python文件样品.py用下面的代码

from pyspark.conf import SparkConf
from pyspark.sql import SparkSession
aws_access_key = "xxxx"
aws_secret_key = "xxxxyyyy"
bucket = "redshiftbucketadrian"
spark = SparkSession.builder.master("yarn").appName("Connect to redshift").enableHiveSupport().getOrCreate()
sc = spark.sparkContext
sql_context =    HiveContext(sc)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", aws_access_key)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", aws_secret_key)
df = sql_context.read\
    .format("com.databricks.spark.redshift")\
    .option("url", "jdbc:redshift://xxxxx")\
    .option("dbtable", "dev")\
    .option("tempdir", "s3n://xxxx/")\
    .load()

然后我执行了以下命令:

^{pr2}$

但是,它一直在给我看这个

enter image description here

.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixed
Sleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-04 21:27:35 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0
.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixed
Sleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-04 21:27:37 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0
.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixed
Sleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2018-04-04 21:27:39 INFO  Client:871 - Retrying connect to server: 0.0.0.0/0.0.0
.0:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixed
Sleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

我错过了什么?在


Tags: tokeyawsredshifttimeispolicysleep