根据嵌套列表python中的类别统计用户数

2024-06-01 01:18:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个有两个子列表的列表。 这里看起来像这样

a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]

我想根据类别统计用户数(唯一)。你知道吗

要求:

required = [['referral',3],['affiliate',3],['cpc',4],['orgainic',2]]

我得到的输出:

{'referral': 3, 'affiliate': 2, 'cpc': 4, 'orgainic': 3}

算错了。你知道吗

以下是我尝试的代码:

a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]

required = [['referral',3],['affiliate',3],['cpc',4],['orgainic',2]]

c = {}
visits = []
for i in a:
    # print(i)
    for j in i[1:]:
        if j not in c and i[0] not in visits:
            c[j] = 1
            visits.append(i[0])
        elif j in c and i[0] not in visits:
            c[j] = c[j]+1
print(c)

帮我解决一些问题。。。你知道吗


Tags: in列表notaffiliatecpcvisitsuser1user2
3条回答

这听起来像是熊猫的例子,你的列表已经是正确的形状了:

import pandas as pd
a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]

df = pd.DataFrame(a)
df.columns=["user", "type"]

unique_per_type = df.groupby("type")["user"].unique()

现在每个类型的唯一\u是:

type
affiliate            [user1, user7, user9]
cpc          [user4, user14, user2, user8]
orgainic                    [user3, user2]
referral             [user1, user2, user4]
Name: user, dtype: object

你可以这样做:

# access length by key
len(unique_per_type["affiliate"]) 

# or use it like a dict
for key, val in unique_per_type.items():
    print(key, len(val)))

这个解决方案添加了pandas,这是一个巨大的依赖关系。但是一旦你把数据放在一个数据框里,你就可以用它做很多事情:

df["user"].unique() # shows all unique users

df.query("user=='user1'") # shows all observations involving user1

首先,让我们使条目具有唯一性:

c = {tuple(sublist) for sublist in a}

现在我们有了一对独特的用户和类型。你知道吗

对于计数,我们不需要用户,所以让我们将其列为一个只有第二个参数的列表:

c = [elem[1] for elem in c]

现在我们可以很容易地数到:

from collections import Counter
c = Counter(c)

结果:Counter({'cpc': 4, 'affiliate': 3, 'referral': 3, 'orgainic': 2})


现在我们来总结一下:

from collections import Counter

c = Counter(elem[1] for elem in {tuple(sublist) for sublist in a})

这是一种使用collections.defaultdict的方法。你知道吗

例如:

from collections import defaultdict

a = [['user1', 'referral'], ['user2', 'referral'], ['user1', 'referral'], ['user1', 'affiliate'], ['user7', 'affiliate'], ['user1', 'affiliate'], ['user9', 'affiliate'], ['user4', 'cpc'], ['user4', 'referral'], ['user2', 'referral'], ['user7', 'affiliate'], ['user14', 'cpc'], ['user3', 'orgainic'], ['user2', 'orgainic'], ['user4', 'cpc'], ['user2', 'cpc'], ['user8', 'cpc'], ['user2', 'orgainic']]
result = defaultdict(int)
seen = set()
for k, v in a:
    key = "{}_{}".format(k, v)
    if key not in seen:
        result[v] += 1
        seen.add(key)
print(list(map(list, result.items())))

输出:

[['referral', 3], ['affiliate', 3], ['cpc', 4], ['orgainic', 2]]

相关问题 更多 >