为什么cross_val_分数返回所有NaN？

df.describe(include = "all") TRXN_MONTH TRANSACTION_AMOUNT count 598565.000000 5.985650e+05 mean 6.410199 2.457275e+07 std 3.446896 2.732986e+08 min 1.000000 2.000000e-02 25% 3.000000 1.823501e+04 50% 6.000000 1.649049e+05 75% 9.000000 1.318875e+06 max 12.000000 1.694837e+10

# Ensemble by stacking estimator_list = [ ('lof', LocalOutlierFactor(novelty=False, n_neighbors=20, contamination='auto')), ('iforest', IsolationForest(n_estimators=100, contamination='auto')) ] ensemble = StackingClassifier(estimators=estimator_list, final_estimator=LogisticRegression(), cv=5) # Set the number of folds and how parameter values are shuffled kf = model_selection.KFold(n_splits=10, random_state=10, shuffle=True) # Evaluate model using cross-validation ensemble_cross_vald=model_selection.cross_val_score(ensemble, df_train[['TRXN_MONTH']].values, df_train[['TRANSACTION_AMOUNT']].values, cv=kf, n_jobs=-1, scoring='recall') ensemble_cross_vald

y_pred = lof.fit_predict(df) lofs_index = where(y_pred==-1) lofs_index (array([ 17, 43, 61, ..., 598553, 598561, 598562]),) y_pred = iforest.fit_predict(df) lofs_index = where(y_pred==-1) lofs_index (array([ 6, 14, 15, ..., 598549, 598556, 598561]),)

1条回答

网友

1楼 · 发布于 2024-04-26 14:29:20

这里用于叠加的基本模型都不是分类器本身：都是异常值和/或异常检测算法；它们甚至不是受监督的模型，它们根本不使用标签y，正如您从它们各自的fit文档中所看到的：

对于LOF：

fit(X, y=None)
y : Ignored
Not used, present for API consistency by convention.

对于Isolation Forest：

fit(X, y=None, sample_weight=None)
y : Ignored
Not used, present for API consistency by convention.

不属于分类器（也不以任何方式涉及标签y），很明显，这些模型不能用于分类任务，不能单独使用，也不能作为堆叠模型的基本分类器，就像您在这里尝试做的那样。因此，召回度量的nan值是预期的（因为从模型的角度来看，没有任何标签y，因此实际上首先没有任何召回）

相关问题更多 >

编程相关推荐

热门问题

热门文章