如何使用Pandas.value_counts计算(a列)中事件发生的次数,以及(b列)中规定的groupby year次数

2024-03-28 20:00:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经预处理了这个df,其中包含美国紧急情况和灾难历史的信息,现在包含 1960-2017年的“``['地点、灾害类型、开始日期、结束日期、灾害长度、年份'”

现在,我想创建2个新的dfs

  1. =每年发生灾难的次数
  2. =每年发生各类灾害的次数

这是我目前试图计算每年发生的灾难数量并创建一个新的df的尝试,但我不确定如何让它具体计算每年的灾难数量

#Number of each Disaster each year

df_yearly_dcount=df_time.groupby(df_time['Start_year']).count()

至于第二个,我不太确定每年有多少次灾难,因为我需要先弄清楚第一次灾难,然后才能继续前进,继续分离

这是完整的代码:

import numpy as np
import matplotlib.pyplot as plt 
import pandas as pd 
import seaborn as sns 

from scipy.stats import zscore

#Import Datased
df = pd.read_csv('database.csv')

df_time = (df[['County','Disaster Type','Start Date', 'End Date']][0: :])

#Preprocessing      
     
#Number of NaN values          
df_nan = df[['County','Disaster Type','Start Date', 'End Date']].isna().sum()

#NaN values as a percentage as total 
df_nan_number = [(df_nan.sum(axis=0)), str((((539/45330)*100))) +'%']

#Remove NaN values
df_time.dropna(subset = ["County", 'End Date'], inplace=True)

#Set Date Format
df_time['Start_Date_A'] = pd.to_datetime(df['Start Date'], format='%m/%d/%Y')
df_time['End_Date_A'] = pd.to_datetime(df['End Date'], format='%m/%d/%Y')

#Create new column == Disaster Length
df_time['Disaster_Length'] = (df_time.Start_Date_A - df_time.End_Date_A).dt.days

#Create new column == start year
df_time['Start_year'] = df_time['Start_Date_A'].dt.year

#Dropped  Old Date Formats from df
df_time = df_time.drop(columns=['Start Date', 'End Date'], axis=1)

#Replace 0 day values with 1 to indicate a Disaster length of 1 Day
df_time['Disaster_Length'] = df_time['Disaster_Length'].replace({0:1})

#Replace all values with absolute values so all days are represented as positive numeric values
df_time['Disaster_Length'] = df_time['Disaster_Length'].abs()


# Locating man-made and and non 'natural' disasters, sorting Disaster types, and analyzing value counts
df_DTypes= df_time['Disaster Type'].values

df_DTypes=pd.DataFrame(df_DTypes)


df_DType_VCounts=(df_DTypes.value_counts()).sort_values(ascending=True)


Df_DType_Natural=(df_DType_VCounts.drop(['Human Cause', 'Chemical', 'Dam/Levee Break', 'Terrorism','Other'],axis=0)).sort_values(ascending=True)

df_time = df_time.rename(columns={'Disaster Type': 'Disaster_Type'})

#Removing non-natural disasters from main df_time
df_time = df_time[(df_time.Disaster_Type != 'Human Cause') & (df_time.Disaster_Type != 'Chemical') & (df_time.Disaster_Type != 'Dam/Levee Break') & (df_time.Disaster_Type != 'Terrorism') & (df_time.Disaster_Type != 'Other') ]

#Analysis 

#Dataframe with mean disaster length for each year
df_yearly_mean = df_time.groupby(['Start_year']).mean()


#Number of Disasters per year
df_yearly_dcount=df_time.groupby(df_time['Start_year']).count().reset_index(name='Disaster_Type')


#Number of each Disaster each year

这是df的可复制样品:


,County,Disaster_Type,Start_Date_A,End_Date_A,Disaster_Length,Start_year
89,Clay County,Flood,1959-01-29,1959-01-29,1,1959
181,Alpine County,Flood,1964-12-24,1964-12-24,1,1964
182,Amador County,Flood,1964-12-24,1964-12-24,1,1964
183,Butte County,Flood,1964-12-24,1964-12-24,1,1964
184,Colusa County,Flood,1964-12-24,1964-12-24,1,1964
185,Del Norte County,Flood,1964-12-24,1964-12-24,1,1964
186,El Dorado County,Flood,1964-12-24,1964-12-24,1,1964
187,Glenn County,Flood,1964-12-24,1964-12-24,1,1964
188,Humboldt County,Flood,1964-12-24,1964-12-24,1,1964
189,Lake County,Flood,1964-12-24,1964-12-24,1,1964
190,Lassen County,Flood,1964-12-24,1964-12-24,1,1964
191,Marin County,Flood,1964-12-24,1964-12-24,1,1964
192,Mendocino County,Flood,1964-12-24,1964-12-24,1,1964
193,Modoc County,Flood,1964-12-24,1964-12-24,1,1964
194,Napa County,Flood,1964-12-24,1964-12-24,1,1964
195,Nevada County,Flood,1964-12-24,1964-12-24,1,1964
196,Placer County,Flood,1964-12-24,1964-12-24,1,1964
197,Plumas County,Flood,1964-12-24,1964-12-24,1,1964
198,Sacramento County,Flood,1964-12-24,1964-12-24,1,1964
199,San Joaquin County,Flood,1964-12-24,1964-12-24,1,1964
200,Shasta County,Flood,1964-12-24,1964-12-24,1,1964
201,Sierra County,Flood,1964-12-24,1964-12-24,1,1964
202,Siskiyou County,Flood,1964-12-24,1964-12-24,1,1964
203,Solano County,Flood,1964-12-24,1964-12-24,1,1964
204,Sonoma County,Flood,1964-12-24,1964-12-24,1,1964
205,Stanislaus County,Flood,1964-12-24,1964-12-24,1,1964
206,Sutter County,Flood,1964-12-24,1964-12-24,1,1964
207,Tehama County,Flood,1964-12-24,1964-12-24,1,1964
208,Trinity County,Flood,1964-12-24,1964-12-24,1,1964
209,Tuolumne County,Flood,1964-12-24,1964-12-24,1,1964
210,Yolo County,Flood,1964-12-24,1964-12-24,1,1964
211,Yuba County,Flood,1964-12-24,1964-12-24,1,1964
212,Baker County,Flood,1964-12-24,1964-12-24,1,1964
213,Benton County,Flood,1964-12-24,1964-12-24,1,1964
214,Clackamas County,Flood,1964-12-24,1964-12-24,1,1964
215,Clatsop County,Flood,1964-12-24,1964-12-24,1,1964
216,Columbia County,Flood,1964-12-24,1964-12-24,1,1964
217,Coos County,Flood,1964-12-24,1964-12-24,1,1964
218,Crook County,Flood,1964-12-24,1964-12-24,1,1964
219,Curry County,Flood,1964-12-24,1964-12-24,1,1964
220,Deschutes County,Flood,1964-12-24,1964-12-24,1,1964
221,Douglas County,Flood,1964-12-24,1964-12-24,1,1964
222,Gilliam County,Flood,1964-12-24,1964-12-24,1,1964
223,Grant County,Flood,1964-12-24,1964-12-24,1,1964
224,Harney County,Flood,1964-12-24,1964-12-24,1,1964
225,Hood River County,Flood,1964-12-24,1964-12-24,1,1964
226,Jackson County,Flood,1964-12-24,1964-12-24,1,1964
227,Jefferson County,Flood,1964-12-24,1964-12-24,1,1964
228,Josephine County,Flood,1964-12-24,1964-12-24,1,1964
229,Klamath County,Flood,1964-12-24,1964-12-24,1,1964
230,Lake County,Flood,1964-12-24,1964-12-24,1,1964
231,Lane County,Flood,1964-12-24,1964-12-24,1,1964
232,Lincoln County,Flood,1964-12-24,1964-12-24,1,1964
233,Linn County,Flood,1964-12-24,1964-12-24,1,1964
234,Malheur County,Flood,1964-12-24,1964-12-24,1,1964
235,Marion County,Flood,1964-12-24,1964-12-24,1,1964
236,Morrow County,Flood,1964-12-24,1964-12-24,1,1964
237,Multnomah County,Flood,1964-12-24,1964-12-24,1,1964
238,Polk County,Flood,1964-12-24,1964-12-24,1,1964
239,Sherman County,Flood,1964-12-24,1964-12-24,1,1964
240,Tillamook County,Flood,1964-12-24,1964-12-24,1,1964
241,Umatilla County,Flood,1964-12-24,1964-12-24,1,1964
242,Union County,Flood,1964-12-24,1964-12-24,1,1964
243,Wallowa County,Flood,1964-12-24,1964-12-24,1,1964
244,Wasco County,Flood,1964-12-24,1964-12-24,1,1964
245,Washington County,Flood,1964-12-24,1964-12-24,1,1964
246,Wheeler County,Flood,1964-12-24,1964-12-24,1,1964
247,Yamhill County,Flood,1964-12-24,1964-12-24,1,1964
248,Asotin County,Flood,1964-12-29,1964-12-29,1,1964
249,Benton County,Flood,1964-12-29,1964-12-29,1,1964
250,Clark County,Flood,1964-12-29,1964-12-29,1,1964
251,Columbia County,Flood,1964-12-29,1964-12-29,1,1964
252,Cowlitz County,Flood,1964-12-29,1964-12-29,1,1964
253,Garfield County,Flood,1964-12-29,1964-12-29,1,1964
254,Grays Harbor County,Flood,1964-12-29,1964-12-29,1,1964
255,King County,Flood,1964-12-29,1964-12-29,1,1964
256,Kittitas County,Flood,1964-12-29,1964-12-29,1,1964
257,Klickitat County,Flood,1964-12-29,1964-12-29,1,1964
258,Lewis County,Flood,1964-12-29,1964-12-29,1,1964
259,Mason County,Flood,1964-12-29,1964-12-29,1,1964
260,Pacific County,Flood,1964-12-29,1964-12-29,1,1964
261,Pierce County,Flood,1964-12-29,1964-12-29,1,1964
262,Skamania County,Flood,1964-12-29,1964-12-29,1,1964
263,Snohomish County,Flood,1964-12-29,1964-12-29,1,1964
264,Spokane County,Flood,1964-12-29,1964-12-29,1,1964
265,Wahkiakum County,Flood,1964-12-29,1964-12-29,1,1964
266,Walla Walla County,Flood,1964-12-29,1964-12-29,1,1964
267,Whitman County,Flood,1964-12-29,1964-12-29,1,1964
268,Yakima County,Flood,1964-12-29,1964-12-29,1,1964
269,Ada County,Flood,1964-12-31,1964-12-31,1,1964
270,Bannock County,Flood,1964-12-31,1964-12-31,1,1964
271,Benewah County,Flood,1964-12-31,1964-12-31,1,1964
272,Blaine County,Flood,1964-12-31,1964-12-31,1,1964
273,Boise County,Flood,1964-12-31,1964-12-31,1,1964
274,Bonneville County,Flood,1964-12-31,1964-12-31,1,1964
275,Butte County,Flood,1964-12-31,1964-12-31,1,1964
276,Camas County,Flood,1964-12-31,1964-12-31,1,1964
277,Caribou County,Flood,1964-12-31,1964-12-31,1,1964
278,Cassia County,Flood,1964-12-31,1964-12-31,1,1964
279,Clearwater County,Flood,1964-12-31,1964-12-31,1,1964

1条回答
网友
1楼 · 发布于 2024-03-28 20:00:18

您可以在groupby上调用size来获取计数

#Number of Disasters each year.
df.groupby('Start_year').size()
Start_year
1959     1
1964    99
dtype: int64

#Number of each disasters for each year.
df.groupby(['Start_year', 'Disaster_Type']).size()
Start_year  Disaster_Type
1959        Flood             1
1964        Flood            99
dtype: int64

相关问题 更多 >