pyspark saveAsTextFile适用于python2.7，但不适用于3.4

网友

1楼 · 编辑于 2024-05-15 17:37:05

试试这个

export PYSPARK_PYTHON=python3

网友

2楼 · 编辑于 2024-05-15 17:37:05

当您运行python3.4时，您的EMR集群可能配置为使用pyspark2.7，这可能会在与pyspark2.7一起使用时出现问题

下面的链接描述了如何配置AmazonEMR以使用Python3.4中的spark

I know Python 3.4.3 is installed on an Amazon EMR cluster instances, but the default Python version used by Spark and other programs is Python 2.7.10. How do I change the default Python version to Python 3 and run a pyspark job?

https://aws.amazon.com/premiumsupport/knowledge-center/emr-pyspark-python-3x/

Python2和Python3中的range()函数有不同的实现。在

在Python2中range()返回a list of numbers。
在Python2中range()返回a generator。在

因此，当您使用Python3时，您提供的输入是generator，而不是{}

关于Python2与Python3中range()之间差异的更多信息：

Python https://docs.python.org/2/library/functions.html#range 范围（开始、停止[、步进]）

This is a versatile function to create lists containing arithmetic progressions. It is most often used in for loops. The arguments must be plain integers. If the step argument is omitted, it defaults to 1. If the start argument is omitted, it defaults to 0. The full form returns a list of plain integers [start, start + step, start + 2 * step, ...]. If step is positive, the last element is the largest start + i * step less than stop; if step is negative, the last element is the smallest start + i * step greater than stop. step must not be zero (or else ValueError is raised).

示例：

>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Python 3https://docs.python.org/3/library/functions.html#func-range范围（开始、停止[，步骤]）

Rather than being a function, range is actually an immutable sequence type, as documented in Ranges and Sequence Types — list, tuple, range.

^{pr2}$

网友

3楼 · 编辑于 2024-05-15 17:37:05

好吧，看来这和Python3无关，和我的康达环境有关。简而言之，我在bootstrap.sh中设置了一个conda环境，但实际上我只在主节点上激活了它。所以主节点使用conda python，但是工人使用的是系统python。在

我现在的解决方案是设置PYSPARK_PYTHON=/home/hadoop/miniconda3/envs/myenv/python。在

有没有更好的方法来激活worker节点上的conda环境？在

相关问题更多 >

编程相关推荐

热门问题

热门文章