基于其他列中数据的列表中单词的频率问题的回答

基于其他列中数据的列表中单词的频率

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我认为需要： <pre><code>df = (df.set_index('Frequency')['Utterance'] .str.split(expand=True) .stack() .groupby(level=0) .value_counts() .reset_index(name='new') .assign(Frequency = lambda x: x.Frequency * x['new']) .groupby('level_1', as_index=False)['Frequency'].sum() .rename(columns={'level_1':'Words'}) ) print (df) Words Frequency 0 Direct 201 1 Directions 1045 2 Display 376 3 Give 612 4 Navigate 678 5 Show 754 6 Starbucks 3666 7 directions 1366 8 me 2065 9 navigation 376 10 to 3666 </code></pre> 如果每行仅包含唯一的单词，则解决方案是简化： <pre><code>df = (df.set_index('Frequency')['Utterance'] .str.split(expand=True) .stack() .reset_index(name='Words') .groupby('Words', as_index=False)['Frequency'].sum() ) print (df) Words Frequency 0 Direct 201 1 Directions 1045 2 Display 376 3 Give 612 4 Navigate 678 5 Show 754 6 Starbucks 3666 7 directions 1366 8 me 2065 9 navigation 376 10 to 3666 </code></pre> 解释： <ol> <li>从列<code>Frequency</code>创建索引</li> <li><a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.split.html" rel="nofollow noreferrer">^{<cd2>}</a>句到词到<code>DataFrame</code></li> <li>按<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.stack.html" rel="nofollow noreferrer">^{<cd4>}</a>重塑形状</li> <li>按<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.SeriesGroupBy.value_counts.html" rel="nofollow noreferrer">^{<cd5>}</a>获取每个组的计数</li> <li>具有<code>Frequency</code>乘<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.assign.html" rel="nofollow noreferrer">^{<cd7>}</a>的多重计数列</li> <li>用<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.sum.html" rel="nofollow noreferrer">^{<cd9>}</a>按单词聚合<code>sum</code></li> </ol>

基于其他列中数据的列表中单词的频率

1 个回答

相关Python问题