我可以展示不同的死刑方法，并预测未来几年吗

import pandas as pd import numpy as np import matplotlib.pyplot as plt import datetime as dt from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression df['Date'] = pd.to_datetime(df['Date']) res = df[~(df['Date'] < '1999-01-01')] print(res) Count = res['Date'].value_counts() print(Count) time= df['Date'] = pd.to_datetime(df['Date']) df['Date']=df['Date'].map(dt.datetime.toordinal) print (time) x = np.array(time) y = np.array(Count) xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=1/3, random_state=0)

1条回答

网友

1楼 · 发布于 2024-04-25 21:51:55

听起来你想要的是重塑你的数据，这样你就有了每个“方法”的时间序列，然后你可以在预测模型中使用它。可能值得指出的是，“方法”的分布确实是倾斜的（值从1999年起），因此很难/不可能预测其中的大多数：

df['Method'].value_counts()

# Lethal Injection    923
# Electrocution        17
# Gas Chamber           1
# Firing Squad          1

下面是一个解决方案，它将帮助您重塑数据，以获得每个“方法”的时间序列数据（我在最后添加了更多的解释）：

df['Date'] = pd.to_datetime(df['Date'])

df = df[df['Date'].dt.year >= 1999]

df = df.set_index('Date')

df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()

df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)

df2.plot()

我们可以检查数据的新形状是否为我们提供了正确的“方法”计数数：

df2.sum()

# Method
# Electrocution        17.0
# Firing Squad          1.0
# Gas Chamber           1.0
# Lethal Injection    923.0

解释

df['Date'] = pd.to_datetime(df['Date'])

# Filter out rows where date values where the year is less than 1999
df = df[df['Date'].dt.year >= 1999]

# Set the index to be the datetime
df = df.set_index('Date')

# This bit gets interesting - we're grouping by each method and then resampling
# within each group so that we get a row per month, where each month now has a
# count of all the previous rows associated with that month. As the dataframe is
# now filled with the same count value for each column, we arbitrarily take the 
# first one which is 'Name'
# Note: you can change the resampling frequency to any time period you want, 
# I've just chosen month as it is granular enough to cover the whole period
 
df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()

#                              Name
# Method           Date            
# Electrocution    1999-06-30     1
#                  1999-07-31     1
#                  1999-08-31     1
#                  1999-09-30     0
#                  1999-10-31     0
# ...                           ...
# Lethal Injection 2016-08-31     0
#                  2016-09-30     0
#                  2016-10-31     2
#                  2016-11-30     1
#                  2016-12-31     2

df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)

# Method      Electrocution  Firing Squad  Gas Chamber  Lethal Injection
# Date                                                                  
# 1999-01-31            0.0           0.0          0.0              10.0
# 1999-02-28            0.0           0.0          0.0              12.0
# 1999-03-31            0.0           0.0          1.0               7.0
# 1999-04-30            0.0           0.0          0.0              10.0
# 1999-05-31            0.0           0.0          0.0               6.0
# ...                   ...           ...          ...               ...
# 2016-08-31            0.0           0.0          0.0               0.0
# 2016-09-30            0.0           0.0          0.0               0.0
# 2016-10-31            0.0           0.0          0.0               2.0
# 2016-11-30            0.0           0.0          0.0               1.0
# 2016-12-31            0.0           0.0          0.0               2.0

相关问题更多 >

编程相关推荐

热门问题

热门文章