<p>考虑到您的数据帧,这应该是可行的</p>
<pre><code>from dateutil.relativedelta import relativedelta
# Transofrm column to date
payments['date']= pd.to_datetime(payments['date'])
agreement['activation']= pd.to_datetime(agreement['activation'])
final =pd.merge(payments,agreement,on='agreement_id',how='left')
# set date to beginning of month
final['date'] = pd.to_datetime(final.date).dt.to_period('M').dt.to_timestamp()
def set_date_range(df):
if df['payment'].sum() == df['total_fee'].iloc[0]:
return pd.date_range(min(g['date']), periods=df['term_months'].iloc[0], freq='M')
else:
return pd.date_range(min(g['date']),
max(g['date'])+relativedelta(months=+1), freq='M' )
# Create dataframe with dates
seq_df = pd.DataFrame()
for i,g in final.groupby(['cust_id', 'agreement_id']):
seq_df = pd.concat([seq_df,
pd.DataFrame({'cust_id': i[0], 'agreement_id': i[1], 'date': set_date_range(g)})])
# Set date to beginnig of month
seq_df['date'] = pd.to_datetime(seq_df.date).dt.to_period('M').dt.to_timestamp()
final = (pd.concat([final, seq_df], sort=True)
.sort_values(['cust_id', 'agreement_id', 'date'])
.reset_index(drop=True)
.reindex(columns=final.columns))
final['payment'] = final.groupby(by=['cust_id', 'agreement_id'])["payment"].transform("sum")
final = final.drop_duplicates(['cust_id', 'agreement_id', 'date'])
final['n'] = final.groupby(by=['cust_id', 'agreement_id'])["cust_id"].transform("count")
final['payment_due'] = final['payment']/final['n']
final[['cust_id','agreement_id','date', 'payment_due']]
</code></pre>
<p>我无法准确地复制管道表单<code>tidyverse</code>,但输出应该匹配。最困难的部分是<code>seq_df</code>的创建,但应该很好(针对更通用的用例对其进行双重测试)</p>