<p>我认为需要:</p>
<pre><code>df = (df.set_index('Frequency')['Utterance']
.str.split(expand=True)
.stack()
.groupby(level=0)
.value_counts()
.reset_index(name='new')
.assign(Frequency = lambda x: x.Frequency * x['new'])
.groupby('level_1', as_index=False)['Frequency'].sum()
.rename(columns={'level_1':'Words'})
)
print (df)
Words Frequency
0 Direct 201
1 Directions 1045
2 Display 376
3 Give 612
4 Navigate 678
5 Show 754
6 Starbucks 3666
7 directions 1366
8 me 2065
9 navigation 376
10 to 3666
</code></pre>
<p>如果每行仅包含唯一的单词,则解决方案是简化:</p>
<pre><code>df = (df.set_index('Frequency')['Utterance']
.str.split(expand=True)
.stack()
.reset_index(name='Words')
.groupby('Words', as_index=False)['Frequency'].sum()
)
print (df)
Words Frequency
0 Direct 201
1 Directions 1045
2 Display 376
3 Give 612
4 Navigate 678
5 Show 754
6 Starbucks 3666
7 directions 1366
8 me 2065
9 navigation 376
10 to 3666
</code></pre>
<p>解释:</p>
<ol>
<li>从列<code>Frequency</code>创建索引</li>
<li><a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.split.html" rel="nofollow noreferrer">^{<cd2>}</a>句到词到<code>DataFrame</code></li>
<li>按<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.stack.html" rel="nofollow noreferrer">^{<cd4>}</a>重塑形状</li>
<li>按<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.SeriesGroupBy.value_counts.html" rel="nofollow noreferrer">^{<cd5>}</a>获取每个组的计数</li>
<li>具有<code>Frequency</code>乘<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html" rel="nofollow noreferrer">^{<cd7>}</a>的多重计数列</li>
<li>用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.sum.html" rel="nofollow noreferrer">^{<cd9>}</a>按单词聚合<code>sum</code></li>
</ol>