java pyspark中是否有collect()问题的解决方案?
我想从zip文件中获取文件名,但collect()方法有问题:
sc=SparkContext()
spark = SparkSession(sc)
path= "/Desktop/SparkProject/ok.zip"
import zipfile
import io
def zip_extract(x):
in_memory_data = io.BytesIO(x[1])
file_obj = zipfile.ZipFile(in_memory_data, "r")
files = [i for i in file_obj.namelist()]
return dict(zip(files, [file_obj.open(file).read() for file in files]))
zips = sc.binaryFiles(path)
files_data = zips.map(zip_extract).collect()
出现以下错误:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DESKTOP-11E4VBV executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
共 (0) 个答案