如何在数据帧列上最好地执行递归

2024-06-16 09:41:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图计算一个数据帧内时间序列的索引值。此索引依赖于前一行的结果来计算第一次迭代后的每一行。我尝试在数据帧行的迭代中递归地执行此操作,但是我发现计算的前两行是正确的,但是第三行和随后的行是不准确的。你知道吗

我认为这是因为在初始值之后,后续的索引计算出错,然后将所有其他后续计算都设置为错误。你知道吗

是什么导致了这种不准确。有比我采取的更好的方法吗?你知道吗

输出的示例如下所示:


ticket_cat   Sector   Year       factor        Incorrect_index_value  correct_index_value     prev_row
Revenue      LSE      Jan 2004                 100.00                 100.00                  
Revenue      LSE      Jan 2005   4.323542894   104.3235               104.3235                100.00
Revenue      LSE      Jan 2006   3.096308080   98.823                 107.5537      <--incorrect row        
Revenue      LSE      Jan 2007   6.211666      107.476                114.2345  <--incorrect row              
Revenue      LD       Jan 2004                 100.00                 100.0000
Revenue      LD       Jan 2005   3.5218        103.5218               103.5218
Revenue      LD       Jan 2006   2.7417        99.2464                106.3602   <--- incorrect row
Revenue      LD       Jan 2007   3.3506        104.1353               109.9239  <--- incorrect row                           

我的代码片段如下:stpassrev是数据帧

#insert initial value for index
stpassrev['index_value'] = np.where(
       (stpassrev['Year'] == 'Jan 2004' ) & (stpassrev['Ticket_cat']=='Revenue'),
        100.00,np.nan )

#set up initial values for prec_row column
stpassrev['prev_row'] = np.where(
              #only have relevant row impacted
                (stpassrev['Year'] == 'Jan 2005' ) & (stpassrev['Ticke_cat']=='Revenue'),
        100.00,
        np.nan
        )

#calculate the index_value
for i in range(1,len(stpassrev)):
        stpassrev.loc[i,'passrev'] = np.where(
            (stpassrev.loc[i,'Ticket_cat']=='Revenue'  )  & (pd.isna(stpassrev.loc[i,'factor'])==False),
                ((100+stpassrev.loc[i,'factor'] ) /stpassrev.loc[i-1,'index_value'])*100,
                stpassrev.loc[i,'index_value'])


     stpassrev.loc[i,'prev_row'] = stpassrev.loc[i-1,'index_value']


Tags: 数据indexvaluenpyearlocjancat
2条回答

你的问题是你在执行除法,而实际上你需要乘法。您的代码可以简化为:

df['index_value'] = (df.factor.fillna(0)
                       .div(100).add(1)
                       .groupby(df['Sector'])
                       .cumprod()
                    )

输出:

  ticket_cat Sector      Year    factor  index_value
0    Revenue    LSE  Jan 2004       NaN     1.000000
1    Revenue    LSE  Jan 2005  4.323543     1.043235
2    Revenue    LSE  Jan 2006  3.096308     1.075537
3    Revenue    LSE  Jan 2007  6.211666     1.142346
4    Revenue     LD  Jan 2004       NaN     1.000000
5    Revenue     LD  Jan 2005  3.521800     1.035218
6    Revenue     LD  Jan 2006  2.741700     1.063601
7    Revenue     LD  Jan 2007  3.350600     1.099238

根据更新的问题,您只需执行以下操作:

# assign a new temp_factor with initial values and prep for cumprod
stpassrev['temp_factor'] = np.where(stpassrev['factor'].isna(), 1, stpassrev['factor'].add(100).div(100))

# calculate the cumprod based on the temp_factor (grouped by Sector) and multiply by 100 for index_value
stpassrev['index_value'] = stpassrev.groupby('Sector')['temp_factor'].cumprod().mul(100)

结果:

  ticket_cat Sector      Year    factor  temp_factor  index_value
0    Revenue    LSE  Jan 2004       NaN     1.000000   100.000000
1    Revenue    LSE  Jan 2005  4.323543     1.043235   104.323543
2    Revenue    LSE  Jan 2006  3.096308     1.030963   107.553721
3    Revenue    LSE  Jan 2007  6.211666     1.062117   114.234599
4    Revenue     LD  Jan 2004       NaN     1.000000   100.000000
5    Revenue     LD  Jan 2005  3.521800     1.035218   103.521800
6    Revenue     LD  Jan 2006  2.741700     1.027417   106.360057
7    Revenue     LD  Jan 2007  3.350600     1.033506   109.923757

如果需要四舍五入到4位精度,请在.mul(100)之后添加.round(4)

stpassrev['index_value'] = stpassrev.groupby('Sector')['temp_factor'].cumprod().mul(100).round(4)

  ticket_cat Sector      Year    factor  temp_factor  index_value
0    Revenue    LSE  Jan 2004       NaN     1.000000     100.0000
1    Revenue    LSE  Jan 2005  4.323543     1.043235     104.3235
2    Revenue    LSE  Jan 2006  3.096308     1.030963     107.5537
3    Revenue    LSE  Jan 2007  6.211666     1.062117     114.2346
4    Revenue     LD  Jan 2004       NaN     1.000000     100.0000
5    Revenue     LD  Jan 2005  3.521800     1.035218     103.5218
6    Revenue     LD  Jan 2006  2.741700     1.027417     106.3601
7    Revenue     LD  Jan 2007  3.350600     1.033506     109.9238

相关问题 更多 >