如何使用pandas中的groupby按容器对数据进行排序?

2024-06-06 08:59:09 发布

您现在位置:Python中文网/ 问答频道 /正文

问题:如何使用pandas中的groupby按容器对数据进行排序

我想要的是:

release_year listed_in

1920        Documentaries   
1930        TV Shows    
1940        TV Shows
1950        Classic Movies, Documentaries
1960        Documentaries
1970        Classic Movies, Documentaries
1980        Classic Movies, Documentaries
1990        Classic Movies, Documentaries
2000        Classic Movies, Documentaries
2010        Children & Family Movies, Classic Movies, Comedies
2020        Classic Movies, Dramas

为了达到这个效果,我尝试了以下公式:

bins = [1925,1950,1960,1970,1990,2000,2010,2020]
groups = df.groupby(['listed_in', pd.cut(df.release_year, bins)])
groups.size().unstack()

它显示了以下结果:

release_year (1925,1950] (1950,1960] (1960,1970] (1970,1990] (1990,2000] (2000,2010] (2010, 2020] 
listed_in 
Action & Adventure 0 0 0 0 9 16 43
Action & Adventure, Anime Features, Children & Family Movies 0 0 0 0 0 0 1
Action & Adventure, Anime Features, Classic Movies 0 0 0 1 0 0 0
...

461 rows x 7 columns

我还尝试了以下公式:

df['release_year'] = df['release_year'].astype(str).str[0:2] + '0'
df.groupby('release_year')['listed_in'].apply(lambda x: x.mode().iloc[0])

结果如下:


release_year 
190         Dramas 
200     Documentaries
Name: listed_in, dtype:object

以下是数据集的示例:

import pandas as pd
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',NaN,NaN],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'], 
'country':['United States, India, South Korea, China',
'United Kingdom','United States'], 
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})

Tags: theindfreleasetvmoviesfamilyyear
1条回答
网友
1楼 · 发布于 2024-06-06 08:59:09

最简单的方法是使用代码的第一部分,简单地将release_yeara0的最后一个数字设为0。然后你可以.groupby十年,并获得每十年最流行的流派,即mode

输入:

import pandas as pd
import numpy as np
df = pd.DataFrame({
'show_id':['81145628','80117401','70234439'],
'type':['Movie','Movie','TV Show'],
'title':['Norm of the North: King Sized Adventure',
'Jandino: Whatever it Takes',
'Transformers Prime'],
'director':['Richard Finn, Tim Maltby',np.nan,np.nan],
'cast':['Alan Marriott, Andrew Toth, Brian Dobson',
'Jandino Asporaat','Peter Cullen, Sumalee Montano, Frank Welker'], 
'country':['United States, India, South Korea, China',
'United Kingdom','United States'], 
'date_added':['September 9, 2019',
'September 9, 2016',
'September 8, 2018'],
'release_year':['2019','2016','2013'],
'rating':['TV-PG','TV-MA','TV-Y7-FV'],
'duration':['90 min','94 min','1 Season'],
'listed_in':['Children & Family Movies, Comedies',
'Stand-Up Comedy','Kids TV'],
'description':['Before planning an awesome wedding for his',
'Jandino Asporaat riffs on the challenges of ra',
'With the help of three human allies, the Autob']})

代码:

df['release_year'] = df['release_year'].astype(str).str[0:3] + '0'
df = df.groupby('release_year', as_index=False)['listed_in'].apply(lambda x: x.mode().iloc[0])
df

输出:

    release_year  listed_in
0   2010          Children & Family Movies, Comedies

相关问题 更多 >