如何在Python中获取行值顺序的频率?

2024-04-29 14:57:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这个数据框

User   Marketing_Channel
A      Direct marketing
A      Email
A      Paid Search
B      Email
B      Paid Search
C      Email
C      Paid Search

我想以字典的形式知道购买路径的频率,即按特定顺序排列的营销渠道列的行值频率。对于上面的数据帧,答案应该是

{'Direct marketing -> 'Email -> Paid Search':  1,
     'Email -> Paid Search': 2}

Tags: 数据路径search字典emailchannelmarketing形式
2条回答
import pandas as pd

df = pd.DataFrame({'User': ['A', 'A', 'A', 'B', 'B', 'C', 'C'], 'Marketing_Channel': ['Direct_Marketing', 'Email', 'Paid_Search', 'Email', 'Paid_Search', 'Email', 'Paid_Search']})

counts = df.groupby('User')['Marketing_Channel'].apply(list).str.join(" -> ").value_counts().to_dict()

要分解它:

groupby('User')['Marketing_Channel'].apply(list)Marketing_Channel的值聚合到每个User的值列表中

str.join(" -> ")将列表加入到每个操作的路径中

.value_counts().to_dict()计算唯一路径的数量并转换为字典

对于此示例数据,counts包含:

{'Email -> Paid_Search': 2, 'Direct_Marketing -> Email -> Paid_Search': 1}

这对我很有用:

import pandas as pd

#initialize the data
x = pd.DataFrame({'User':['A','A','A','B','B','C','C'],'Marketing_Channel':['Marketing_Channel','Email','Paid Search','Email','Paid Search','Email','Paid Search']})

#grouping by user to get the user journey
y = x.groupby('User').agg({'Marketing_Channel': '->'.join}).reset_index()

#group by channel to get the count
z = y.groupby('Marketing_Channel').count()

#make a json out of it
z.to_json()

相关问题 更多 >