根据每个帐户的付款行为创建状态列

2024-05-29 10:20:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在进行一些手动任务的自动化。我是Python新手,我想请求您的帮助。我有以下数据帧:

    account_id  contract_id 2021-01-01 |  2021-02-01  |  2021-03-01  |  2021-04-01
0   account1    contract11  200.0         200.0          200.0          NaN 
1   account1    contract12  NaN           300.0          300.0          300.0   
2   account1    contract13  NaN           100.0          100.0          100.0   
3   account1    contract14  NaN           NaN            NaN            NaN 
4   account2    contract21  500.0         NaN            NaN            NaN 
5   account2    contract22  NaN           NaN            NaN            NaN 
6   account3    contract31  NaN           NaN            NaN            250.0   

我需要以下输出(带有一个新列“status”):

 account_id contract_id |   status    | 2021-01-01 |  2021-02-01  |  2021-03-01  |  2021-04-01
0   account1    contract11  Original    200.0         200.0          200.0          NaN 
1   account1    contract12  Upgrade     NaN           300.0          300.0          300.0   
2   account1    contract13  Downgrade   NaN           100.0          100.0          100.0   
3   account1    contract14  Renewal     NaN           NaN            NaN            NaN 
4   account2    contract21  Original    500.0         NaN            NaN            NaN 
5   account2    contract22  Winback     NaN           NaN            NaN            NaN 
6   account3    contract31  Original    NaN           NaN            NaN            250.0   

每个帐户至少可以有一个合同

如果它是第一份有第一笔付款的合同,那么它就是“原件”

如果合同是在前一份合同没有付款的4个月后签订的,则为“赢回”

如果合同的月付款额高于之前的合同,则为“升级”

如果合同的付款额低于前一份合同,则为“降级”

如果合同的付款额与前一份合同相等,则为“续签”

我尝试了以下代码(使用日期激活列),但无法正常工作:

def get_types(monthly_payments):
    def f(s):
        check = monthly_payments.loc[
            (s.date_activated.year == monthly_payments.index.year) &
            (s.date_activated.month == monthly_payments.index.month)
            ].iloc[0]

        if check.wb == 0:
            # If rolling sum of 4 months prior is 0
            s['status'] = 'Winback'
        elif check.og_upg == 0:
            # If Prior Month is 0
            s['status'] = 'Original'

        elif check.max_pmt > check.og_upg:
            # If Prior Month is not missing and current month is more
            s['status'] = 'Upgrade'
        else:
            s['status'] = 'Renewal'
    return f

def apply_types(g):
    # Get Non Payment Info
    account_info = g[g.columns[:4]]
    # Transpose Monthly Payments To Rows
    monthly_payments = g.loc[:, g.columns[4:]].T
    # Make Sure Index is DT
    monthly_payments.index = pd.to_datetime(monthly_payments.index)
    # Get Check for is early based on number of payments
    monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
    # Max Payment In Month
    monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
    # 1 Month Prior
    monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
    # Rolling Sum of 4 Months Prior
    monthly_payments['wb'] = monthly_payments.max_pmt \
        .rolling(min_periods=0, window=4).sum().shift()
    # Concat New Columns With Original Payment Information
    return pd.concat((
        account_info.apply(get_types(monthly_payments), axis=1),
        g[g.columns[4:]]
    ), axis=1)

df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)

Tags: idindexischeckstatusaccountnanmax
1条回答
网友
1楼 · 发布于 2024-05-29 10:20:41

我建议您首先将df重塑为长形,以便于操作。从原始df上的stack开始,去掉nans,重命名列,并将date强制转换为datetime

df = (original_df.set_index(["account_id", "contract_id"]).stack().reset_index()
                 .rename(columns={"level_2":"date", 0:"amount"}))
df["date"] = pd.to_datetime(df["date"])
print (df)

   account_id contract_id       date  amount
0    account1  contract11 2021-01-01   200.0
1    account1  contract11 2021-02-01   200.0
2    account1  contract11 2021-03-01   200.0
3    account1  contract12 2021-02-01   300.0
...
18   account3  contract31 2021-04-01   250.0
19   account3  contract31 2021-05-01   250.0
20   account3  contract31 2021-06-01   250.0

现在,我们可以通过将每个id的第一个合同分配给original来逐步完成:

df.loc[df["contract_id"].eq(df.groupby("account_id")["contract_id"].transform("first")), "status"] = "Original"

接下来,设置不同的需求并使用np.select

cond1 = df["contract_id"].ne(df["contract_id"].shift())
cond2 = df["date"].diff().dt.days>=120
cond3 = df["amount"].diff().gt(0)
cond4 = df["amount"].diff().lt(0)
cond5 = df["amount"].diff().eq(0)
cond6 = df["status"].isnull()

df["status"] = (np.select([cond1&cond2&cond6, cond1&cond3&cond6, cond1&cond4&cond6, cond1&cond5&cond6],
                          ["Winback", "Upgrade", "Downgrade", "Renewal"], df["status"]))

最后将结果分配给原始df:

original_df["status"] = df.groupby(["account_id", "contract_id"])["status"].first().tolist()
print (original_df)

  account_id contract_id  2021-01-01  2021-02-01  2021-03-01  2021-04-01  2021-05-01  2021-06-01  2021-07-01     status
0   account1  contract11       200.0       200.0       200.0         NaN         NaN         NaN         NaN   Original
1   account1  contract12         NaN       300.0       300.0       300.0         NaN         NaN         NaN    Upgrade
2   account1  contract13         NaN       100.0       100.0       100.0       100.0       100.0       100.0  Downgrade
3   account1  contract14         NaN         NaN         NaN         NaN       100.0       100.0       100.0    Renewal
4   account2  contract21       500.0         NaN         NaN         NaN         NaN         NaN         NaN   Original
5   account2  contract22         NaN         NaN         NaN         NaN         NaN       300.0       300.0    Winback
6   account3  contract31         NaN         NaN         NaN       250.0       250.0       250.0         NaN   Original

相关问题 更多 >

    热门问题