<p>没有比我的建议更清晰的答案了,我建议使用下面的函数并不是那么糟糕:</p>
<pre><code>import pandas as pd
import numpy as np
def weighted_means_by_column_ignoring_NaNs(x, cols, w="weights"):
""" This takes a DataFrame and averages each data column (cols),
weighting observations by column w, but ignoring individual NaN
observations within each column.
"""
return pd.Series([np.nan if x.dropna(subset=[c]).empty else \
np.average(x.dropna(subset=[c])[c],
weights =x.dropna(subset=[c])[w] ) \
for c in cols], cols)
</code></pre>
<p>用法示例如下</p>
<pre><code>df=pd.DataFrame({'category':['a','a','b','b'],
'var1':np.random.randint(0,100,4),
'var2':np.random.randint(0,100,4),
'weights':np.random.randint(0,10,4)})
df.loc[1,'var1']=np.nan
df
category var1 var2 weights
0 a 74.0 99 9
1 a NaN 8 4
2 b 13.0 86 2
3 b 49.0 38 7
df.groupby('category').apply(weighted_means_by_column_ignoring_NaNs),
['var1', 'var2'])
var1 var2
category
a 74.0 57.846154
b 23.0 8.000000
</code></pre>