如何使用Pandas查找数据帧中单词列表的频率

2024-04-19 07:42:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我的df看起来像这样:

category       text_list
--------       ---------
soccer         [soccer, game, is, good, soccer, game]
basketball     [game, basketball, game]
volleyball     [sport ,volleyball, sport] 

我想做的是groupby{},然后按frequency列出words

category       text_list          frequency
--------       ---------          ---------
soccer         soccer             2
               game               2 
               is                 1
               good               1
basketball     game               2
               basketball         1  
volleyball     sport              2
               volleyball         1

我做了什么

  • 我能够找到每行的frequency,但是我无法在DataFrame中标记我想要的方式

谁能帮帮我吗?如果可能,使用NLTK


1条回答
网友
1楼 · 发布于 2024-04-19 07:42:24

试试explode然后groupby

(df.explode('text_list')
   .groupby(['category','text_list']).size()
   .to_frame(name='frequency')
)

输出:

                       frequency
category   text_list            
basketball basketball          1
           game                2
soccer     game                2
           good                1
           is                  1
           soccer              2
volleyball sport               2
           volleyball          1

相关问题 更多 >