我有三个变量的数据,我想找出一个变量每天的最大X值。以前我编写了一些代码来查找一天中最大值出现的时间,但是现在我想添加一些选项来查找更多的每天最大小时数。你知道吗
我一直都能找到每天的前X个值,但我一直坚持把它缩小到前X天的前X个值。我还附上了一些图片,详细说明了最终的结果。你知道吗
代码
df = pd.DataFrame(
{'ID':['ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1','ID_1'],
'Year':[2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018],
'Month':[6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6],
'Day':[12,12,12,12,13,13,13,13,14,14,14,14,15,15,15,15,16,16,16,16,17,17,17,17],
'Hour':[19,20,21,22,11,12,13,19,19,20,21,22,18,19,20,21,19,20,21,23,19,20,21,22],
'var_1': [0.83,0.97,0.69,0.73,0.66,0.68,0.78,0.82,1.05,1.05,1.08,0.88,0.96,0.81,0.71,0.88,1.08,1.02,0.88,0.79,0.91,0.91,0.80,0.96],
'var_2': [47.90,42.85,67.37,57.18,66.13,59.96,52.63,54.75,32.54,36.58,36.99,37.23,46.94,52.80,68.79,50.84,37.79,43.54,48.04,38.01,42.22,47.13,50.96,44.19],
'var_3': [99.02,98.10,98.99,99.12,98.78,98.90,99.09,99.20,99.22,99.11,99.18,99.24,99.00,98.90,98.87,99.07,99.06,98.86,98.92,99.32,98.93,98.97,98.99,99.21],})
# Get the top 2 var2 values each day
top_two_var2_each_day = df.groupby(['ID', 'Year', 'Month', 'Day'])['var_2'].nlargest(2)
top_two_var2_each_day = top_two_var2_each_day.reset_index()
# set level_4 index to the current index
top_two_var2_each_day = top_two_var2_each_day.set_index('level_4')
# use the index from the top_two_var2 to get the rows from df to get values of the other variables when top 2 values occured
top_2_all_vars = df[df.index.isin(top_two_var2_each_day.index)]
最终目标结果
我认为最好的方法是平均这两个小时,以确定哪一天的平均值最大,然后返回到top\u 2\u all\u vars数据帧并获取发生这些天的行。我不知道该怎么办。你知道吗
mean_day = top_2_all_vars.groupby(['ID', 'Year', 'Month', 'Day'],as_index=False)['var_2'].mean()
top_2_day = mean_day.nlargest(2, 'var_2')
这就是我想要找到的结果。一个数据帧,由前2天中每个前2天的var_的前2个值组成。你知道吗
我以前使用的代码可以找到每天的最大值,但我不知道如何使它工作于每天超过一个最大值
# For each ID and Day, Find the Hour where the Max Amount of var_2 occurred and save the index location
df_idx = df.groupby(['ID', 'Year', 'Month', 'Day',])['var_2'].transform(max) == df['var_2']
# Now the hour has been found, store the rows in a new dataframe based on the saved index location
top_var2_hour_of_each_day = df[df_idx]
使用Groupbys可能不是最好的方法,但我对任何事情都持开放态度。你知道吗
这是一种方法:
如果您的数据跨越多个月,那么当月份和日期位于不同的列中时,处理它就困难多了。所以首先我做了一个新的专栏,叫做‘Date’,它结合了月份和日期。你知道吗
接下来,我们需要每天var_的前两个值,然后求它们的平均值。所以我们可以创建一个非常简单的函数来精确地找到它。你知道吗
然后我们使用我们的函数,按var_的平均值排序,得到最高的2天,然后将日期保存到一个列表中。你知道吗
最后,我们根据上面选择的日期进行过滤,然后找到varè2中在这两天的最高值。你知道吗
相关问题 更多 >
编程相关推荐