pyspark.sql.utils.IllegalArgumentException:'要求失败：初始容量无效'

2024-05-26 22:55:25 发布

您现在位置：Python中文网/ 问答频道 /正文

2149

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图使用ML库在Spark中使用决策树运行交叉验证，但是在调用cv.fit(train_dataset)时遇到了这个错误：

pyspark.sql.utils.IllegalArgumentException: u'requirement failed: Invalid initial capacity'

除了数据帧是空的之外，我没有找到很多关于它可能是什么的信息，但它不是。这是我的代码：

df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data')
df.columns = ['Sex', 'Length', 'Diameter', 'Height', 'Whole weight', 'Schuked weight', 'Viscera weight', 'Shell weight', 'Rings']
train_dataset = sqlContext.createDataFrame(df)

column_types = train_dataset.dtypes

categoricalCols = []
numericCols = []

for ct in column_types:
    if ct[1] == 'string':
        categoricalCols += [ct[0]]
    else:
        numericCols += [ct[0]]

stages = []
for categoricalCol in categoricalCols:
    stringIndexer = StringIndexer(inputCol=categoricalCol, outputCol=categoricalCol+"Index")
    stages += [stringIndexer]

assemblerInputs = map(lambda c: c + "Index", categoricalCols) + numericCols
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]

labelIndexer = StringIndexer(inputCol='Rings', outputCol='indexedLabel')
stages += [labelIndexer]

dt = DecisionTreeClassifier(labelCol="indexedLabel", featuresCol="features")

evaluator = MulticlassClassificationEvaluator(labelCol='indexedLabel', predictionCol='prediction', metricName='f1')

paramGrid = (ParamGridBuilder()
             .addGrid(dt.maxDepth, [1,2,6])
             .addGrid(dt.maxBins, [20,40])
             .build())

stages += [dt]
pipeline = Pipeline(stages=stages)

cv = CrossValidator(estimator=pipeline, estimatorParamMaps=paramGrid, evaluator=evaluator, numFolds=1)

cvModel = cv.fit(train_dataset)
train_dataset = cvModel.transform(train_dataset)

我在本地运行Spark单机版。怎么了？在

谢谢！在

Tags： df evaluator dt train dataset cv spark weight

1条回答

网友

1楼 · 发布于 2024-05-26 22:55:25

所以，问题是将CrossValidation的numFolds参数设置为1。如果我只想用ParamGrid进行参数调整，那么显然我需要使用TrainValidationSplit来代替。在

pyspark.sql.utils.IllegalArgumentException:'要求失败：初始容量无效'

相关问题更多 >

编程相关推荐

热门问题

热门文章

pyspark.sql.utils.IllegalArgumentException:'要求失败：初始容量无效'

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >