如何在DataSet中输入多个特定列:Python(sklearn)

2024-06-10 21:01:09 发布

您现在位置:Python中文网/ 问答频道 /正文

没有时间浪费,朝着问题前进。 实际上,我是在Python中使用sklearn.SimpleImputer输入数据集的。 但我的数据集包含一些带有整数的列,以及一些带有其他字母点的列。因此,我使用中位数来填充空白,我只想用整数来填充特定的列,而不是整个数据集。 我试过这个:

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(strategy="median")
imputer.fit(students['age'], ['sex'], ['failures'])

我只想对这些列进行插补,这些列只有intigers值,而不是所有数据集,因为所有数据集也包含alphbets数据点的列,这些数据点的中值不能取

从上面的代码中,我得到了以下错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2894             try:
-> 2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('age', 'sex', 'failures')

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
<ipython-input-26-8961e0ce249f> in <module>
      2 from sklearn.impute import SimpleImputer
      3 imputer = SimpleImputer(strategy="median")
----> 4 imputer.fit(students['age', 'sex', 'failures'])

~\anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2900             if self.columns.nlevels > 1:
   2901                 return self._getitem_multilevel(key)
-> 2902             indexer = self.columns.get_loc(key)
   2903             if is_integer(indexer):
   2904                 indexer = [indexer]

 ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2895                 return self._engine.get_loc(casted_key)
   2896             except KeyError as err:
 -> 2897                 raise KeyError(key) from err
   2898 
   2899         if tolerance is not None:

KeyError: ('age', 'sex', 'failures')

数据的链接是https://archive.ics.uci.edu/ml/machine-learning-databases/00320/

谢谢!希望您理解这个问题,我已尽力解释。


Tags: 数据keyinselfpandasagegetindex