获取大Pandas不同群体的事件总数

country product date_install date_purchase user_id BR yearly 2020-11-01 2020-11-01 10660236 CA monthly 2020-11-01 2020-11-01 10649441 US yearly 2020-11-01 trialed 10660272 IT monthly 2020-11-01 2020-11-01 10657634 AE monthly 2020-11-01 2020-11-01 10661442 IT monthly 2020-11-01 trialed 10657634 AE monthly 2020-11-01 trialed 10661442

country product date_install installs purchases ratio US daily 2021-02-05 100 20 0.2 US monthly 2021-02-05 100 50 0.5 US yearly 2021-02-05 100 50 0.5 US trialed 2021-02-05 100 0 0 # the next day US daily 2021-02-06 500 50 0.1 US monthly 2021-02-06 500 100 0.2 US yearly 2021-02-06 500 250 0.5 US trialed 2021-02-06 500 0 0 # the rest of the countries & the rest of the days

exp = df.groupby(['country','product','date_install']).count() .sort_values('date_install',ascending=False).reset_index() exp.groupby(['country','product','date_install'])['date_purchase'].sum().reset_index() exp['total_installs'] = exp.groupby(['country','product','date_install'])['date_purchase'].sum().reset_index()

df['date_purchase'] = df['date_purchase'].replace('trialed', np.nan) exp = (df.groupby(['country','product','date_install']).agg(installs = ('date_purchase','size'), purchases = ('date_purchase','count'))) exp['ratio'] = exp['purchases'].div(exp['installs']) exp = exp.reset_index()

country product date_install installs purchases ratio US catalog30US 2020-11-18 1 1 1.0 US trialed 2020-11-18 4924 0 0.0 US renders.100 2020-11-18 2 2 1.0 US renders.20 2020-11-18 3 3 1.0 US monthly 2020-11-18 37 37 1.0 US yearly 2020-11-18 6 6 1.0 US textures 2020-11-18 1 1 1.0

country product date_install installs purchases ratio US catalog30US 2020-11-18 4974 1 1 / 4974 US trialed 2020-11-18 4974 0 0.0 US renders.100 2020-11-18 4974 2 2 / 4974 US renders.20 2020-11-18 4974 3 3 / 4974 US monthly 2020-11-18 4974 37 37 / 4974 US yearly 2020-11-18 4974 6 6 / 4974 US textures 2020-11-18 4974 1 1 / 4974

1条回答

网友

1楼 · 发布于 2024-04-20 04:09:51

我认为，对于缺少值的计数，您需要按^{}进行聚合，对于排除缺少值的计数，您需要按^{}进行聚合，然后对列进行除法：

df['date_purchase'] = df['date_purchase'].replace('trialed', np.nan)

exp = (df.groupby(['country','product','date_install'])
         .agg(installs = ('date_purchase','size'), purchases = ('date_purchase','count')))

#sum per country and install date
exp['installs'] = exp.groupby(['country','date_install'])['installs'].transform('sum')
exp['ratio'] = exp['purchases'].div(exp['installs'])

exp = exp.reset_index()
print (exp)

相关问题更多 >

编程相关推荐

热门问题

热门文章