`nth`中断pandas中已排序的数据帧

2024-04-28 18:01:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含美国人口数据的数据框census_df

         STNAME             CTYNAME  CENSUS2010POP
0       Alabama      Autauga County          54571
1       Alabama      Baldwin County         182265
2       Alabama      Barbour County          27457
3       Alabama         Bibb County          22915
4       Alabama       Blount County          57322
5       Alabama      Bullock County          10914
6       Alabama       Butler County          20947
7       Alabama      Calhoun County         118572
8       Alabama     Chambers County          34215
9       Alabama     Cherokee County          25989
10      Alabama      Chilton County          43643
11      Alabama      Choctaw County          13859
12      Alabama       Clarke County          25833
13      Alabama         Clay County          13932
14      Alabama     Cleburne County          14972
15      Alabama       Coffee County          49948
16      Alabama      Colbert County          54428
17      Alabama      Conecuh County          13228
18      Alabama        Coosa County          11539
19      Alabama    Covington County          37765
20      Alabama     Crenshaw County          13906
21      Alabama      Cullman County          80406
22      Alabama         Dale County          50251
23      Alabama       Dallas County          43820
24      Alabama       DeKalb County          71109
25      Alabama       Elmore County          79303
26      Alabama     Escambia County          38319
27      Alabama       Etowah County         104430
28      Alabama      Fayette County          17241
29      Alabama     Franklin County          31704
...         ...                 ...            ...
3112  Wisconsin     Washburn County          15911
3113  Wisconsin   Washington County         131887
3114  Wisconsin     Waukesha County         389891
3115  Wisconsin      Waupaca County          52410
3116  Wisconsin     Waushara County          24496
3117  Wisconsin    Winnebago County         166994
3118  Wisconsin         Wood County          74749
3119    Wyoming       Albany County          36299
3120    Wyoming     Big Horn County          11668
3121    Wyoming     Campbell County          46133
3122    Wyoming       Carbon County          15885
3123    Wyoming     Converse County          13833
3124    Wyoming        Crook County           7083
3125    Wyoming      Fremont County          40123
3126    Wyoming       Goshen County          13249
3127    Wyoming  Hot Springs County           4812
3128    Wyoming      Johnson County           8569
3129    Wyoming      Laramie County          91738
3130    Wyoming      Lincoln County          18106
3131    Wyoming      Natrona County          75450
3132    Wyoming     Niobrara County           2484
3133    Wyoming         Park County          28205
3134    Wyoming       Platte County           8667
3135    Wyoming     Sheridan County          29116
3136    Wyoming     Sublette County          10247
3137    Wyoming   Sweetwater County          43806
3138    Wyoming        Teton County          21294
3139    Wyoming        Uinta County          21118
3140    Wyoming     Washakie County           8533
3141    Wyoming       Weston County           7208

[3142 rows x 3 columns]

这些列表示州名、县名和人口。现在,我试着找出每个州人口最多的三个县,然后我想求出他们的人口总数,这样我就能得到每个州的数字。为了获得各州人口最多的县,我尝试了以下方法:

^{pr2}$

这给了我以下信息(我只显示最后几个值):

           CENSUS2010POP          CTYNAME
STNAME                                   
Wisconsin         488073      Dane County
Wisconsin         389891  Waukesha County
Wyoming            91738   Laramie County
Wyoming            46133  Campbell County
Wyoming            75450   Natrona County

如您所见,对于最后一个状态Wyoming,在使用nth之后,根据总体对状态的排序受到了干扰。其他许多州也有这种情况。有人能告诉我发生了什么吗?在选择前三个值时,我如何保持排序后的值不变?在


Tags: 数据排序状态人口campbellcountywisconsinalabama
2条回答

您可以将^{}^{}一起使用.sort_values(ascending=False).head(n)更快:

print (census_df.set_index('CTYNAME')
                .groupby('STNAME')['CENSUS2010POP']
                .nlargest(3)
                .sort_index(ascending=False)
                .reset_index())

      STNAME            CTYNAME  CENSUS2010POP
0    Wyoming     Natrona County          75450
1    Wyoming     Laramie County          91738
2    Wyoming    Campbell County          46133
3  Wisconsin   Winnebago County         166994
4  Wisconsin    Waukesha County         389891
5  Wisconsin  Washington County         131887
6    Alabama      Etowah County         104430
7    Alabama     Calhoun County         118572
8    Alabama     Baldwin County         182265

3顶值之和:

^{pr2}$

我相信你想做的是:

group = census_df.groupby('STNAME').head(3)

这将返回每个组的前3行。在

要获得每个状态的总和,只需在组上运行一个带有sum aggregate函数的groupby

^{pr2}$

相关问题 更多 >