- data has a large number of features: you probably want to run a PCA, XGBOOST or another feature importance evaluation to separate useful features from noise features
- you have a large amount of text data, i.e. logs: you might want to attach a naive Bayes, tf/idf or another model that performs well with text-based data
- does your data tend to overfit when using model X? Maybe you want to do data engineering or try a different model
回答你的问题:这是一个不错的模式-在大多数情况下建议-开始。在
更重要的问题是,在我看来,你应该问的是你拥有什么样的用户数据,以及它在所选模型下的表现如何:
我给你的建议是先建立LR模型,看看它在你的列车/测试/预测数据集上的表现,并评估性能是否能满足你的需求,然后再考虑/讨论不同的模型/方法。在
相关问题 更多 >
编程相关推荐