我目前正在进行一些手动任务的自动化。我是Python新手,我想请求您的帮助。我有以下数据帧:
account_id contract_id 2021-01-01 | 2021-02-01 | 2021-03-01 | 2021-04-01
0 account1 contract11 200.0 200.0 200.0 NaN
1 account1 contract12 NaN 300.0 300.0 300.0
2 account1 contract13 NaN 100.0 100.0 100.0
3 account1 contract14 NaN NaN NaN NaN
4 account2 contract21 500.0 NaN NaN NaN
5 account2 contract22 NaN NaN NaN NaN
6 account3 contract31 NaN NaN NaN 250.0
我需要以下输出(带有一个新列“status”):
account_id contract_id | status | 2021-01-01 | 2021-02-01 | 2021-03-01 | 2021-04-01
0 account1 contract11 Original 200.0 200.0 200.0 NaN
1 account1 contract12 Upgrade NaN 300.0 300.0 300.0
2 account1 contract13 Downgrade NaN 100.0 100.0 100.0
3 account1 contract14 Renewal NaN NaN NaN NaN
4 account2 contract21 Original 500.0 NaN NaN NaN
5 account2 contract22 Winback NaN NaN NaN NaN
6 account3 contract31 Original NaN NaN NaN 250.0
每个帐户至少可以有一个合同
如果它是第一份有第一笔付款的合同,那么它就是“原件”
如果合同是在前一份合同没有付款的4个月后签订的,则为“赢回”
如果合同的月付款额高于之前的合同,则为“升级”
如果合同的付款额低于前一份合同,则为“降级”
如果合同的付款额与前一份合同相等,则为“续签”
我尝试了以下代码(使用日期激活列),但无法正常工作:
def get_types(monthly_payments):
def f(s):
check = monthly_payments.loc[
(s.date_activated.year == monthly_payments.index.year) &
(s.date_activated.month == monthly_payments.index.month)
].iloc[0]
if check.wb == 0:
# If rolling sum of 4 months prior is 0
s['status'] = 'Winback'
elif check.og_upg == 0:
# If Prior Month is 0
s['status'] = 'Original'
elif check.max_pmt > check.og_upg:
# If Prior Month is not missing and current month is more
s['status'] = 'Upgrade'
else:
s['status'] = 'Renewal'
return f
def apply_types(g):
# Get Non Payment Info
account_info = g[g.columns[:4]]
# Transpose Monthly Payments To Rows
monthly_payments = g.loc[:, g.columns[4:]].T
# Make Sure Index is DT
monthly_payments.index = pd.to_datetime(monthly_payments.index)
# Get Check for is early based on number of payments
monthly_payments['early'] = monthly_payments.astype(bool).sum(axis=1) > 1
# Max Payment In Month
monthly_payments['max_pmt'] = monthly_payments.max(axis=1)
# 1 Month Prior
monthly_payments['og_upg'] = monthly_payments.max_pmt.shift().fillna(0)
# Rolling Sum of 4 Months Prior
monthly_payments['wb'] = monthly_payments.max_pmt \
.rolling(min_periods=0, window=4).sum().shift()
# Concat New Columns With Original Payment Information
return pd.concat((
account_info.apply(get_types(monthly_payments), axis=1),
g[g.columns[4:]]
), axis=1)
df = df.groupby('account_id', as_index=False).apply(apply_types).reset_index(drop=True)
我建议您首先将df重塑为长形,以便于操作。从原始df上的
stack
开始,去掉nans
,重命名列,并将date
强制转换为datetime
:现在,我们可以通过将每个id的第一个合同分配给
original
来逐步完成:接下来,设置不同的需求并使用
np.select
:最后将结果分配给原始df:
相关问题 更多 >
编程相关推荐