回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有一个pandas数据框,<code>st</code>包含多个列:</p>
<pre><code><class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss) 53732 non-null values
Julian_Day 53732 non-null values
AOT_1020 53716 non-null values
AOT_870 53732 non-null values
AOT_675 53188 non-null values
AOT_500 51687 non-null values
AOT_440 53727 non-null values
AOT_380 51864 non-null values
AOT_340 52852 non-null values
Water(cm) 51687 non-null values
%TripletVar_1020 53710 non-null values
%TripletVar_870 53726 non-null values
%TripletVar_675 53182 non-null values
%TripletVar_500 51683 non-null values
%TripletVar_440 53721 non-null values
%TripletVar_380 51860 non-null values
%TripletVar_340 52846 non-null values
440-870Angstrom 53732 non-null values
380-500Angstrom 52253 non-null values
440-675Angstrom 53732 non-null values
500-870Angstrom 53732 non-null values
340-440Angstrom 53277 non-null values
Last_Processing_Date(dd/mm/yyyy) 53732 non-null values
Solar_Zenith_Angle 53732 non-null values
dtypes: datetime64[ns](1), float64(22), object(1)
</code></pre>
<p>我想为这个dataframe创建两个新列,这是基于对dataframe的每一行应用一个函数。我不想多次调用这个函数(例如,通过两次单独的<code>apply</code>调用),因为它需要大量的计算。我试过两种方法,但都不管用:</p>
<hr/>
<p><strong>使用<code>apply</code>:</strong></p>
<p>我已经编写了一个函数,它接受一个<code>Series</code>,并返回一个我想要的值的元组:</p>
<pre><code>def calculate(s):
a = s['path'] + 2*s['row'] # Simple calc for example
b = s['path'] * 0.153
return (a, b)
</code></pre>
<p>尝试将此应用于数据帧时出错:</p>
<pre><code>st.apply(calculate, axis=1)
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-248-acb7a44054a7> in <module>()
----> 1 st.apply(calculate, axis=1)
C:\Python27\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, args, **kwds)
4191 return self._apply_raw(f, axis)
4192 else:
-> 4193 return self._apply_standard(f, axis)
4194 else:
4195 return self._apply_broadcast(f, axis)
C:\Python27\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures)
4274 index = None
4275
-> 4276 result = self._constructor(data=results, index=index)
4277 result.rename(columns=dict(zip(range(len(res_index)), res_index)),
4278 inplace=True)
C:\Python27\lib\site-packages\pandas\core\frame.pyc in __init__(self, data, index, columns, dtype, copy)
390 mgr = self._init_mgr(data, index, columns, dtype=dtype, copy=copy)
391 elif isinstance(data, dict):
--> 392 mgr = self._init_dict(data, index, columns, dtype=dtype)
393 elif isinstance(data, ma.MaskedArray):
394 mask = ma.getmaskarray(data)
C:\Python27\lib\site-packages\pandas\core\frame.pyc in _init_dict(self, data, index, columns, dtype)
521
522 return _arrays_to_mgr(arrays, data_names, index, columns,
--> 523 dtype=dtype)
524
525 def _init_ndarray(self, values, index, columns, dtype=None,
C:\Python27\lib\site-packages\pandas\core\frame.pyc in _arrays_to_mgr(arrays, arr_names, index, columns, dtype)
5411
5412 # consolidate for now
-> 5413 mgr = BlockManager(blocks, axes)
5414 return mgr.consolidate()
5415
C:\Python27\lib\site-packages\pandas\core\internals.pyc in __init__(self, blocks, axes, do_integrity_check)
802
803 if do_integrity_check:
--> 804 self._verify_integrity()
805
806 self._consolidate_check()
C:\Python27\lib\site-packages\pandas\core\internals.pyc in _verify_integrity(self)
892 "items")
893 if block.values.shape[1:] != mgr_shape[1:]:
--> 894 raise AssertionError('Block shape incompatible with manager')
895 tot_items = sum(len(x.items) for x in self.blocks)
896 if len(self.items) != tot_items:
AssertionError: Block shape incompatible with manager
</code></pre>
<p>然后我将使用<a href="https://stackoverflow.com/questions/12356501/pandas-create-two-new-columns-in-a-dataframe-with-values-calculated-from-a-pre">this question</a>中显示的方法将从<code>apply</code>返回的值赋给两个新列。然而,我甚至不能达到这一点!如果我只返回一个值,这一切都可以正常工作。</p>
<hr/>
<p><strong>使用循环:</strong></p>
<p>我首先创建了dataframe的两个新列,并将它们设置为<code>None</code>:</p>
<pre><code>st['a'] = None
st['b'] = None
</code></pre>
<p>然后在所有索引上循环,并试图修改我在其中得到的这些<code>None</code>值,但修改似乎没有起作用。也就是说,没有生成错误,但是数据帧似乎没有被修改。</p>
<pre><code>for i in st.index:
# do calc here
st.ix[i]['a'] = a
st.ix[i]['b'] = b
</code></pre>
<hr/>
<p>我以为这两种方法都能奏效,但都不行。那么,我在这里做错了什么?什么是最好的,最'Python'和'潘道尼'的方式来做这件事?</p>