我想在泰坦尼克号的数据集中做出预测。在
我想试试catboost,我遵循以下指南:https://www.analyticsvidhya.com/blog/2017/08/catboost-automated-categorical-data/ 但当我试图复制时,它是行不通的
我试着遵循指南,我认为catboost会处理所有的数据转换,因为你可以在指南中看到他正在使用dtypes对象,float,int
import pandas as pd
import numpy as np
from catboost import CatBoostRegressor
from sklearn.tree import DecisionTreeRegressor
train = pd.read_csv('train.csv')
test = pd.read_csv('test.csv')
train = train.replace(np.nan, '', regex=True)
train.fillna(-999, inplace=True)
test.fillna(-999,inplace=True)
categorical_features_indices = np.where(X.dtypes != np.float)[0]
train = train.replace(np.nan, '', regex=True)
y= train.Survived
train_features=['Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']
X= train[train_features]
from sklearn.model_selection import train_test_split
X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size=0.7, random_state=1234)
#importing library and building model
from catboost import CatBoostRegressor
model=CatBoostRegressor(iterations=50, depth=3, learning_rate=0.1, loss_function='RMSE')
model.fit(X_train, y_train,cat_features=categorical_features_indices,eval_set=(X_validation, y_validation),plot=True)
我试着从你提供的链接运行代码,它起作用了。也许这和你的Python版本有关。在
在GitHub上有一个类似的问题,据说python版本有不同的结果。也许可以尝试改变你的Python版本。在
Me规格:
相关问题 更多 >
编程相关推荐