我正试图完成一项简单的任务,即在我的数据框中的每一列中修剪所有空白。我有一些值在单词后、单词前有尾随空格,还有一些列只包含" "
值。我要把这些都脱光
我读了{a1},它为实现这一目标提供了一个很好的途径:
data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
然而,我经常得到以下信息:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-31d35db1d48c> in <module>
1 df = (pd.read_csv('C:\\Users\\wundermahn\Desktop\\aggregated_po_data.csv',
----> 2 encoding = "ISO-8859-1", low_memory=False).apply(lambda x: x.str.strip() if (x.dtype == "object") else x))
3 print(df.shape)
4
5 label = df['class']
c:\python367-64\lib\site-packages\pandas\core\frame.py in apply(self, func, axis, raw, result_type, args, **kwds)
6876 kwds=kwds,
6877 )
-> 6878 return op.get_result()
6879
6880 def applymap(self, func) -> "DataFrame":
c:\python367-64\lib\site-packages\pandas\core\apply.py in get_result(self)
184 return self.apply_raw()
185
--> 186 return self.apply_standard()
187
188 def apply_empty_result(self):
c:\python367-64\lib\site-packages\pandas\core\apply.py in apply_standard(self)
294 try:
295 result = libreduction.compute_reduction(
--> 296 values, self.f, axis=self.axis, dummy=dummy, labels=labels
297 )
298 except ValueError as err:
pandas\_libs\reduction.pyx in pandas._libs.reduction.compute_reduction()
pandas\_libs\reduction.pyx in pandas._libs.reduction.Reducer.get_result()
<ipython-input-9-31d35db1d48c> in <lambda>(x)
1 df = (pd.read_csv('C:\\Users\\wundermahn\Desktop\\aggregated_data.csv',
----> 2 encoding = "ISO-8859-1", low_memory=False).apply(lambda x: x.str.strip() if (x.dtype == "object") else x))
3 print(df.shape)
4
5 label = df['ON_TIME']
c:\python367-64\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
5268 or name in self._accessors
5269 ):
-> 5270 return object.__getattribute__(self, name)
5271 else:
5272 if self._info_axis._can_hold_identifiers_and_holds_name(name):
c:\python367-64\lib\site-packages\pandas\core\accessor.py in __get__(self, obj, cls)
185 # we're accessing the attribute of the class, i.e., Dataset.geo
186 return self._accessor
--> 187 accessor_obj = self._accessor(obj)
188 # Replace the property with the accessor object. Inspired by:
189 # http://www.pydanny.com/cached-property.html
c:\python367-64\lib\site-packages\pandas\core\strings.py in __init__(self, data)
2039
2040 def __init__(self, data):
-> 2041 self._inferred_dtype = self._validate(data)
2042 self._is_categorical = is_categorical_dtype(data)
2043 self._is_string = data.dtype.name == "string"
c:\python367-64\lib\site-packages\pandas\core\strings.py in _validate(data)
2096
2097 if inferred_dtype not in allowed_types:
-> 2098 raise AttributeError("Can only use .str accessor with string values!")
2099 return inferred_dtype
2100
**AttributeError: Can only use .str accessor with string values!**
因此,在试图找到解决方法时,我偶然发现了这篇文章,其中建议使用:
data_frame_trimmed = data_frame.apply(lambda x: x.str.strip() if x.dtype == "str" else x)
但是,这并不能去除只包含空格或制表符的空单元格
如何有效地去除所有类型的空白?我最终将删除值超过50%null
的列
您必须检查的不是列类型,而是每个单个值的类型, 因此,代码可以是,例如:
原因是:
但这样做会不必要地执行类型列的代码 除了对象,在该对象中,任何内容都不会更改。 如果这让您感到困扰,请仅对可能存在此问题的列运行此代码 要更改任何内容:
首先使用
select_dtypes
选择正确的列:您可以试试
try
:相关问题 更多 >
编程相关推荐