在Livy会话语句中传递pySpark脚本

data = { 'code': textwrap.dedent(""" import random NUM_SAMPLES = 100000 def sample(p): x, y = random.random(), random.random() return 1 if x*x + y*y < 1 else 0 count = sc.parallelize(xrange(0, NUM_SAMPLES)).map(sample).reduce(lambda a, b: a + b) print "Pi is roughly %f" % (4.0 * count / NUM_SAMPLES) """) } r = requests.post(statements_url, data=json.dumps(data), headers=headers)

1条回答

网友

1楼 · 发布于 2024-06-08 08:14:25

我不确定这是否回答了您的问题，但我成功地使用cURL在EMR上创建了一个Spark会话，如下所示：

$ curl -H "Content-Type: application/json" -X POST -d '{"kind":"pyspark", "conf": {"spark.yarn.dist.pyFiles": "s3://bucket-name/test.py"}}' http://ec2-3-87-28-125.compute-1.amazonaws.com:8998/sessions
{"id":0,"name":null,"appId":null,"owner":null,"proxyUser":null,"state":"starting","kind":"pyspark","appInfo":{"driverLogUrl":null,"sparkUiUrl":null},"log":["stdout: ","\nstderr: ","\nYARN Diagnostics: "]}

我检查了/mnt/var/log/livy/livy-livy-server.out，发现这一行表示会话已成功创建：

20/08/31 18:02:25 INFO InteractiveSession: Interactive session 0 created [appid: application_1598896609416_0002, owner: null, proxyUser: None, state: idle, kind: pyspark, info: {driverLogUrl=http://ip-172-31-85-247.ec2.internal:8042/node/containerlogs/container_1598896609416_0002_01_000001/livy, sparkUiUrl=http://ip-172-31-95-182.ec2.internal:20888/proxy/application_1598896609416_0002/}]

相关问题更多 >

编程相关推荐

热门问题

热门文章