LightGBMError(序列数据>数量特征())>(0)

2024-06-11 00:26:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我发现了错误

[LightGBM] [Fatal] Check failed: (train_data->num_features()) > (0)

对于具有形状(40,7)的数据集X。我正在尝试为自定义损失函数运行梯度增强

如有任何解决方案或提示,将不胜感激

线路上出现了错误

gbm.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric=custom_asymmetric_valid,
    verbose=False,
)

以下是完整的代码:

import lightgbm
import pandas as pd 
from sklearn.model_selection import train_test_split
import numpy as np

train = pd.read_csv("Data_Train.csv")
X, y = train.iloc[:, 1:-1], train.iloc[:, -1] 

X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.20, random_state=42)
print(np.shape(X_train),np.shape(X_valid))


test = pd.read_csv("Data_Test.csv")
X_test, y_test = test.iloc[:, 1:-1], test.iloc[:, -1] 

# Defining custom loss function

def custom_asymmetric_train(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    grad = np.where(residual<0, -2*10.0*residual, -2*residual)
    hess = np.where(residual<0, 2*10.0, 2.0)
    return grad, hess

def custom_asymmetric_valid(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    loss = np.where(residual < 0, (residual**2)*10.0, residual**2) 
    return "custom_asymmetric_eval", np.mean(loss), False

# default lightgbm model with sklearn api
gbm = lightgbm.LGBMRegressor(random_state=33) 

# updating objective function to custom
# default is "regression"
# also adding metrics to check different scores
gbm.set_params(**{'objective': custom_asymmetric_train}, metrics = ["mse", 'mae'])

# fitting model 
gbm.fit(
    X_train,
    y_train,
    eval_set=[(X_valid, y_valid)],
    eval_metric=custom_asymmetric_valid,
    verbose=False,
)

y_pred = gbm.predict(X_valid)


# create dataset for lightgbm

lgb_train = lgb.Dataset(X_train, y_train, free_raw_data=False)
lgb_eval = lgb.Dataset(X_valid, y_valid, reference=lgb_train, free_raw_data=False)


params = {
    'objective': 'regression',
    'verbose': 0
}

gbm = lgb.train(params,
                lgb_train,
                num_boost_round=10,
                init_model=gbm,
                fobj=custom_asymmetric_train,
                feval=custom_asymmetric_valid,
                valid_sets=lgb_eval)
                
y_pred = gbm.predict(X_valid)


Tags: testimportfalsemodelcustomevalnptrain
1条回答
网友
1楼 · 发布于 2024-06-11 00:26:35

您的原始示例不是完全可复制的(因为"Data_Train.csv"的内容没有共享),但我可以使用LightGBM 3.1.1(随pip install lightgbm安装)可靠地用以下代码复制您提到的错误消息

import lightgbm as lgb
import numpy as np
import pandas as pd

np.random.seed(708)

def custom_asymmetric_train(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    grad = np.where(residual<0, -2*10.0*residual, -2*residual)
    hess = np.where(residual<0, 2*10.0, 2.0)
    return grad, hess

# create a training dataset of shape (40, 7)
X = pd.DataFrame({
    f"feat_{i}": np.random.random((40,))
    for i in range(7)
})
y = np.random.random((40,))

gbm = lgb.LGBMRegressor()
gbm.set_params(**{'objective': custom_asymmetric_train}, metrics = ["mse", 'mae'])
gbm.fit(X, y)

LightGBMError: Check failed: (train_data->num_features()) > (0)

LightGBM具有一些用于防止过度装配的参数。在这种情况下,有两个是相关的:

  • ^{}(默认值=20)
  • ^{}(默认值=0.001)

默认情况下,在构建Dataset对象的过程中,LightGBM会过滤掉无法基于这些条件拆分的功能(请参见^{}

LightGBM的参数默认值旨在在中等大小的数据集上提供良好的性能。形状为(40, 7)的数据集非常小,这增加了所有功能无法使用的风险

为了适应这样一个小的数据集,您可以覆盖默认值并将其设置为0或更小的值。下面的代码训练成功,没有错误

import lightgbm as lgb
import numpy as np
import pandas as pd

np.random.seed(708)

def custom_asymmetric_train(y_true, y_pred):
    residual = (y_true - y_pred).astype("float")
    grad = np.where(residual<0, -2*10.0*residual, -2*residual)
    hess = np.where(residual<0, 2*10.0, 2.0)
    return grad, hess

# create a training dataset of shape (40, 7)
X = pd.DataFrame({
    f"feat_{i}": np.random.random((40,))
    for i in range(7)
})
y = np.random.random((40,))

gbm = lgb.LGBMRegressor(
    min_sum_in_hessian=0,
    min_data_in_leaf=0
)
gbm.set_params(**{'objective': custom_asymmetric_train}, metrics = ["mse", 'mae'])
gbm.fit(X, y)

相关问题 更多 >