我有一个X_序列(14599,13),我试图用列的中位数来插补NaN,但不知怎的,它用行结果误差来插补,因为行中有日期,而不是整数值。我已经在查找SimpleImputer是否有axis参数,但找不到它存在。如何解决这个问题
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.impute import SimpleImputer
from sklearn.model_selection import train_test_split
plt.close('all')
avo_sales = pd.read_csv('avocados.csv')
avo_sales.rename(columns = {'4046':'small PLU sold',
'4225':'large PLU sold',
'4770':'xlarge PLU sold'},
inplace= True)
avo_sales.columns = avo_sales.columns.str.replace(' ','')
plt.scatter(avo_sales.Date,avo_sales.TotalBags)
x = np.array(avo_sales.drop(['TotalBags'],1))
y = np.array(avo_sales.TotalBags)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.2)
imp = SimpleImputer(strategy='median')
X_train = imp.fit_transform(X_train)
输出
ValueError: Cannot use median strategy with non-numeric data:
could not convert string to float: '12/31/2017'
在插补时,可以尝试删除日期列:
将名称“date_column”更改为正确的名称
否则,可能会将包含日期列的列从字符串转换为日期对象:
但我不确定SimpleImputer是否可以处理日期类型
相关问题 更多 >
编程相关推荐