在经过训练的LGBM模型上测试新数据

#below is while training the model cvname = CountVectorizer(min_df=NAME_MIN_DF) X_name = cvname.fit_transform(merge['name']) pickle.dump(cvname, open("namevector.pkl", "wb")) . . . . #after completing the training, and loading the new data handle_missing_inplace(mytest) cutting(mytest) to_categorical(mytest) cv1 = pickle.load(open("namevector.pkl", "rb")) X_name1 = cv1.transform(mytest['name']) cv2 = pickle.load(open("categoryvector.pkl", "rb")) X_category1 = cv2.transform(mytest['category_name']) tv1 = pickle.load(open("descriptionvector.pkl", "rb")) X_description1 = tv1.transform(mytest['item_description']) lb1 = pickle.load(open("brandvector.pkl", "rb")) X_brand1 = lb1.transform(mytest['brand_name']) t1 = pd.get_dummies(mytest[['item_condition_id', 'shipping']],sparse=True) X_dummies1 = csr_matrix(t1.values.astype('int64')) sparse_merge1 = hstack((X_dummies1, X_description1, X_brand1, X_category1, X_name1)).tocsr() X_test1 = sparse_merge1 my_pred = pkl_bst1.predict(X_test1) mysubmission['price'] = np.expm1(my_pred)

1条回答

网友

1楼 · 发布于 2024-06-16 10:40:48

{a1}通常称为^。或者是underfitting。与其他ML算法一样，LGBM对这两种算法都很敏感。在

这意味着该模型在训练和测试数据方面做得很好，但在新数据上表现不佳。The model is not generalizing well，它只是在记忆训练数据。有一些关于如何处理LGBM过度拟合的建议，但是有很多关于这个问题的信息，你应该花时间阅读。谷歌是通常的起点。在

收集更多的数据有时是解决问题的方法。几十万，几百万。机器学习是一个需要大量数据的行业。在

你必须调整一些模型参数并进行大量的训练，直到你的预测开始改善（如果有的话）。它被称为parameter tuning。在

这是ML最难对付的一面

不过，别灰心。在

相关问题更多 >

编程相关推荐

热门问题

热门文章