如何在Spark中为Python3.5安装numpy和pandas?

2024-05-23 08:51:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试在Spark中使用python3.5而不是python2.7运行线性回归。所以首先我导出了PYSPARK_PHTHON=python3。我收到一个错误“没有名为numpy的模块”。我试图“pip install numpy”,但pip无法识别PYSPARK_PYTHON的设置。如何让pip为3.5安装numpy?谢谢您。。。在

$ export PYSPARK_PYTHON=python3

$ spark-submit linreg.py
....
Traceback (most recent call last):
  File "/home/yoda/Code/idenlink-examples/test22-spark-linreg/linreg.py", line 115, in <module>
from pyspark.ml.linalg import Vectors
  File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/__init__.py", line 22, in <module>
  File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/base.py", line 21, in <module>
  File "/home/yoda/install/spark/python/lib/pyspark.zip/pyspark/ml/param/__init__.py", line 26, in <module>
  ImportError: No module named 'numpy'

$ pip install numpy
Requirement already satisfied: numpy in /home/yoda/.local/lib/python2.7/site-packages

$ pyspark
Python 3.5.2 (default, Nov 17 2016, 17:05:23) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
17/02/09 20:29:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/02/09 20:29:20 WARN Utils: Your hostname, yoda-VirtualBox resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
17/02/09 20:29:20 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
17/02/09 20:29:31 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Python version 3.5.2 (default, Nov 17 2016 17:05:23)
SparkSession available as 'spark'.
>>> import site; site.getsitepackages()
['/usr/local/lib/python3.5/dist-packages', '/usr/lib/python3/dist-packages', '/usr/lib/python3.5/dist-packages']
>>> 

Tags: installpiptoinpynumpyhomelib
2条回答

所以我并不认为这是一个火花问题。在我看来你需要环境方面的帮助。正如评论者提到的,您需要设置一个python3环境,激活它,然后安装numpy。查看this以获得有关处理环境的一些帮助。在设置了python3环境之后,您应该激活它,然后运行pip install numpy或{},这样就可以开始了。在

如果您正在运行作业local,您只需要升级pyspark

自制程序:brew upgrade pyspark这应该可以解决大多数依赖关系。在

相关问题 更多 >

    热门问题