PySpark数据帧pickle异常:ClassDict的构造应为零参数(对于PySpark.ml.linalg.DenseVector)

2024-05-23 14:25:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我无法选择在onevsrest分类器模型上转换的数据帧

错误描述: net.razorvine.pickle.PickleException:构造ClassDict(对于pyspark.ml.linalg.DenseVector)的参数应为零

df = spark.sql("select distinct * from MasterData table")
df.select("*").show(5)

enter image description here

df = df.withColumn("featureval", array(df["featureval"]))

countVectors = CountVectorizer(inputCol="featureval", outputCol="features1", vocabSize=10000, minDF=5)
label_stringIdx = StringIndexer(inputCol = "Type", outputCol = "label")
pipeline = Pipeline(stages=[countVectors, label_stringIdx])
pipelineFit = pipeline.fit(df)
dataset = pipelineFit.transform(df)


dataset.select("*").show(5)

enter image description here

(train, test) = dataset.randomSplit([0.8, 0.2])

lr = LogisticRegression(maxIter=10, tol=1E-6, fitIntercept=True)
ovr = OneVsRest(classifier=lr)
ovrModel = ovr.fit(dataset)
predictions = ovrModel.transform(test)

现在,当我尝试选择数据帧predictions.select("*").show(10)时,我得到了pickle异常

任何帮助都将不胜感激!! 提前谢谢


Tags: 数据dfpipelineshowselectdatasetlabelpickle