<p>问题是,您将帧与具有不同行索引的不同大小的帧相乘。以下是解决方案:</p>
<pre><code>In [121]: df = DataFrame([[1,2.2,3.5],[6.1,0.4,1.2]], columns=list('abc'))
In [122]: weight = DataFrame(Series([0.5, 0.3, 0.2], index=list('abc'), name=0))
In [123]: df
Out[123]:
a b c
0 1.00 2.20 3.50
1 6.10 0.40 1.20
In [124]: weight
Out[124]:
0
a 0.50
b 0.30
c 0.20
In [125]: df * weight
Out[125]:
0 a b c
0 nan nan nan nan
1 nan nan nan nan
a nan nan nan nan
b nan nan nan nan
c nan nan nan nan
</code></pre>
<p>您可以访问列:</p>
<pre><code>In [126]: df * weight[0]
Out[126]:
a b c
0 0.50 0.66 0.70
1 3.05 0.12 0.24
In [128]: (df * weight[0]).sum(1)
Out[128]:
0 1.86
1 3.41
dtype: float64
</code></pre>
<p>或者用<code>dot</code>来找回另一个<code>DataFrame</code></p>
<pre><code>In [127]: df.dot(weight)
Out[127]:
0
0 1.86
1 3.41
</code></pre>
<p>要将所有这些结合起来:</p>
<pre><code>In [130]: df['weighted_sum'] = df.dot(weight)
In [131]: df
Out[131]:
a b c weighted_sum
0 1.00 2.20 3.50 1.86
1 6.10 0.40 1.20 3.41
</code></pre>
<p>下面是每个方法的<code>timeit</code>,使用较大的<code>DataFrame</code>。</p>
<pre><code>In [145]: df = DataFrame(randn(10000000, 3), columns=list('abc'))
weight
In [146]: weight = DataFrame(Series([0.5, 0.3, 0.2], index=list('abc'), name=0))
In [147]: timeit df.dot(weight)
10 loops, best of 3: 57.5 ms per loop
In [148]: timeit (df * weight[0]).sum(1)
10 loops, best of 3: 125 ms per loop
</code></pre>
<p>对于广泛的<code>DataFrame</code>:</p>
<pre><code>In [162]: df = DataFrame(randn(10000, 1000))
In [163]: weight = DataFrame(randn(1000, 1))
In [164]: timeit df.dot(weight)
100 loops, best of 3: 5.14 ms per loop
In [165]: timeit (df * weight[0]).sum(1)
10 loops, best of 3: 41.8 ms per loop
</code></pre>
<p>因此,<code>dot</code>速度更快,可读性更强。</p>
<p><strong>注意:</strong>如果任何数据包含<code>NaN</code>s,则不应使用<code>dot</code>应使用乘法和和方法。<code>dot</code>无法处理<code>NaN</code>s,因为它只是<code>numpy.dot()</code>(它不处理<code>NaN</code>)周围的一个薄包装。</p>