月度经常性支出的Python代码

-3 投票
1 回答
69 浏览
提问于 2025-04-14 15:55

我正在尝试找出每月重复的支出,这些支出需要有相同的ID、类别和金额。而且这些支出必须至少在连续的3个月中出现。

我写的代码没有得到正确的结果。

import pandas as pd

# Updated Sample DataFrame
data = {
    'id': [1, 1, 1, 2, 2, 2, 1, 1, 2, 2,2,2],
    'date': ['2023-01-17', '2023-01-15', '2022-11-16', '2023-01-16', '2022-12-14', '2022-11-10', '2022-12-20', '2022-12-10', '2023-01-25', '2022-11-05','2022-10-05','2022-09-07'],
    'category': ['Groceries', 'Utilities', 'Groceries', 'Groceries', 'Utilities', 'Groceries', 'Groceries', 'Utilities', 'Groceries', 'Utilities','Utilities','Utilities'],
    'amount': [50, 50, 50, 60, 80, 80, 50, 55, 75, 75,75,75]
}

df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])  # Convert 'date' column to datetime format

# Sort DataFrame 
df = df.sort_values(by=['id','category','amount', 'date'], ascending=[True, True,True,False])

df['recurring'] = 0

for i in range(len(df)-2):
    
    if (df.iloc[i]['id'] == df.iloc[i+1]['id'] == df.iloc[i+1]['id']) and \
    (df.iloc[i]['category'] == df.iloc[i+1]['category'] == df.iloc[i+1]['category']) and \
    df.iloc[i]['amount'] == df.iloc[i+1]['amount'] == df.iloc[i+1]['amount'] and \
    (df.iloc[i]['date'].to_period('M') - df.iloc[i-1]['date'].to_period('M')) == 1 and (df.iloc[i-1]['date'].to_period('M') - df.iloc[i-2]['date'].to_period('M')) == 1:
        df['recurring'] = 1
        
print(df)

但是我得到的结果是:所有重复支出的值都是0。我的逻辑哪里出错了呢?

 id       date   category  amount  recurring
0    1 2023-01-17  Groceries      50          0
6    1 2022-12-20  Groceries      50          0
2    1 2022-11-16  Groceries      50          0
1    1 2023-01-15  Utilities      50          0
7    1 2022-12-10  Utilities      55          0
3    2 2023-01-16  Groceries      60          0
8    2 2023-01-25  Groceries      75          0
5    2 2022-11-10  Groceries      80          0
9    2 2022-11-05  Utilities      75          0
10   2 2022-10-05  Utilities      75          0
11   2 2022-09-07  Utilities      75          0
4    2 2022-12-14  Utilities      80          0

我期望的输出是:

 id       date   category  amount  recurring
0    1 2023-01-17  Groceries      50          1
6    1 2022-12-20  Groceries      50          0
2    1 2022-11-16  Groceries      50          0
1    1 2023-01-15  Utilities      50          0
7    1 2022-12-10  Utilities      55          0
3    2 2023-01-16  Groceries      60          0
8    2 2023-01-25  Groceries      75          0
5    2 2022-11-10  Groceries      80          0
9    2 2022-11-05  Utilities      75          1
10   2 2022-10-05  Utilities      75          0
11   2 2022-09-07  Utilities      75          0
4    2 2022-12-14  Utilities      80          0

1 个回答

0

你可以使用 groupby.rolling,不过因为按月份滚动的功能不支持,所以你需要把日期转换成按月的时间段:

df['period'] = pd.to_datetime(df['date']).dt.to_period('M')

group = ['id', 'category', 'amount']
N = 3 # number of consecutive months

df['recurring'] = (df
   .assign(recurring=1)
   .sort_values(by=group+['date'])
   # ensure we count a type of group only once
   .drop_duplicates(group+['period'])
   .groupby(group).rolling(N, on='period')['recurring']
   # check that we have N consecutive months in the window
   .apply(lambda x: x.index.nunique() == N and 
          (x.index.max()-x.index.min()).n == N-1)
   .fillna(0).astype(int)
   .reset_index(name='recurring')
   .merge(df[group+['period']], how='right')
   ['recurring'].to_numpy()
)

输出结果:

    id       date   category  amount  recurring   period
0    1 2023-01-17  Groceries      50          1  2023-01
6    1 2022-12-20  Groceries      50          0  2022-12
2    1 2022-11-16  Groceries      50          0  2022-11
1    1 2023-01-15  Utilities      50          0  2023-01
7    1 2022-12-10  Utilities      55          0  2022-12
3    2 2023-01-16  Groceries      60          0  2023-01
8    2 2023-01-25  Groceries      75          0  2023-01
5    2 2022-11-10  Groceries      80          0  2022-11
9    2 2022-11-05  Utilities      75          1  2022-11
10   2 2022-10-05  Utilities      75          0  2022-10
11   2 2022-09-07  Utilities      75          0  2022-09
4    2 2022-12-14  Utilities      80          0  2022-12

另外,你也可以使用按天的窗口(rolling 不支持按月份的频率):

df['date'] = pd.to_datetime(df['date'])

window = '63D'
N = 3

df['recurring'] = (df
   .sort_values(by=['id', 'category', 'date'])
   .groupby(['id', 'category']).rolling(window, on='date')
   ['amount'].apply(lambda x: len(x)).ge(N).astype(int)
   .reset_index(name='recurring')
   .merge(df[['id', 'category', 'date']].drop_duplicates(),
          how='right')
   ['recurring']
)

撰写回答