对pandas数据框的每一行应用函数以创建两个新列

<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23 Data columns: Date(dd-mm-yy)_Time(hh-mm-ss) 53732 non-null values Julian_Day 53732 non-null values AOT_1020 53716 non-null values AOT_870 53732 non-null values AOT_675 53188 non-null values AOT_500 51687 non-null values AOT_440 53727 non-null values AOT_380 51864 non-null values AOT_340 52852 non-null values Water(cm) 51687 non-null values %TripletVar_1020 53710 non-null values %TripletVar_870 53726 non-null values %TripletVar_675 53182 non-null values %TripletVar_500 51683 non-null values %TripletVar_440 53721 non-null values %TripletVar_380 51860 non-null values %TripletVar_340 52846 non-null values 440-870Angstrom 53732 non-null values 380-500Angstrom 52253 non-null values 440-675Angstrom 53732 non-null values 500-870Angstrom 53732 non-null values 340-440Angstrom 53277 non-null values Last_Processing_Date(dd/mm/yyyy) 53732 non-null values Solar_Zenith_Angle 53732 non-null values dtypes: datetime64[ns](1), float64(22), object(1)

st.apply(calculate, axis=1) --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) <ipython-input-248-acb7a44054a7> in <module>() ----> 1 st.apply(calculate, axis=1) C:\Python27\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds) 4191 return self._apply_raw(f, axis) 4192 else: -> 4193 return self._apply_standard(f, axis) 4194 else: 4195 return self._apply_broadcast(f, axis) C:\Python27\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures) 4274 index = None 4275 -> 4276 result = self._constructor(data=results, index=index) 4277 result.rename(columns=dict(zip(range(len(res_index)), res_index)), 4278 inplace=True) C:\Python27\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy) 390 mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy) 391 elif isinstance(data, dict): --> 392 mgr = self._init_dict(data, index, columns, dtype=dtype) 393 elif isinstance(data, ma.MaskedArray): 394 mask = ma.getmaskarray(data) C:\Python27\lib\site-packages\pandas\core\frame.pyc in _init_dict(self, data, index, columns, dtype) 521 522 return _arrays_to_mgr(arrays, data_names, index, columns, --> 523 dtype=dtype) 524 525 def _init_ndarray(self, values, index, columns, dtype=None, C:\Python27\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype) 5411 5412 # consolidate for now -> 5413 mgr = BlockManager(blocks, axes) 5414 return mgr.consolidate() 5415 C:\Python27\lib\site-packages\pandas\core\internals.pyc in __init__(self, blocks, axes, do_integrity_check) 802 803 if do_integrity_check: --> 804 self._verify_integrity() 805 806 self._consolidate_check() C:\Python27\lib\site-packages\pandas\core\internals.pyc in _verify_integrity(self) 892 "items") 893 if block.values.shape[1:] != mgr_shape[1:]: --> 894 raise AssertionError('Block shape incompatible with manager') 895 tot_items = sum(len(x.items) for x in self.blocks) 896 if len(self.items) != tot_items: AssertionError: Block shape incompatible with manager

3条回答

网友

1楼 · 编辑于 2024-05-19 01:40:46

要使第一种方法起作用，请尝试返回一个序列而不是一个元组（apply正在引发异常，因为它不知道如何将行重新粘合在一起，因为列数与原始帧不匹配）。

def calculate(s):
    a = s['path'] + 2*s['row'] # Simple calc for example
    b = s['path'] * 0.153
    return pd.Series(dict(col1=a, col2=b))

第二种方法应该有效，如果您替换：

st.ix[i]['a'] = a

使用：

st.ix[i, 'a'] = a

网友

2楼 · 编辑于 2024-05-19 01:40:46

这在这里解决了： Apply pandas function to column to create multiple new columns?

适用于你的问题，这应该有效：

def calculate(s):
    a = s['path'] + 2*s['row'] # Simple calc for example
    b = s['path'] * 0.153
    return pd.Series({'col1': a, 'col2': b})

df = df.merge(df.apply(calculate, axis=1), left_index=True, right_index=True)

网友

3楼 · 编辑于 2024-05-19 01:40:46

我总是使用lambdas和内置的map()函数通过组合其他行来创建新行：

st['a'] = map(lambda path, row: path + 2 * row, st['path'], st['row'])

对于数值列的线性组合，它可能比所需的稍微复杂一些。另一方面，我觉得采用它作为一种约定是很好的，因为它可以用于更复杂的行组合（例如使用字符串）或使用其他列的函数填充列中缺少的数据。

例如，假设您有一个列为gender和title的表，并且缺少一些标题。您可以使用以下函数填充它们：

title_dict = {'male': 'mr.', 'female': 'ms.'}
table['title'] = map(lambda title,
    gender: title if title != None else title_dict[gender],
    table['title'], table['gender'])

相关问题更多 >

编程相关推荐

热门问题

热门文章