如何统计邮件中的相似域名并仅打印每个域名一次[python]?

0 投票
3 回答
2026 浏览
提问于 2025-04-18 14:56

我有一个包含10个hotmail邮箱、4个gmail邮箱和3个mail.com邮箱的数据集。我想分析这些邮箱,统计每个域名(比如hotmail、gmail等)的数量,并把结果打印出来。不过我现在的做法有点粗暴。

我知道用Python可以写出简洁优雅的代码(比如用itertools、islice、xrange等)。

我想要的结果是:

hotmail: 10
gmail: 4
mail.com: 3

但是我得到的结果是:

hotmail
10
hotmail
10
...
hotmail
10
gmail
4
gmail
4
gmail
4
gmail
4
等等

def count_domains( emails):

    for email in emails:

        current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com, 
                                               #2nd index in the list
        print(current_email)
        current_domain_counter = 0
        for email2 in emails:
            if current_email == email2.split("@",2)[1]:
                current_domain_counter = current_domain_counter + 1
        #print(current_email current_domain_counter)
        print(current_domain_counter)

3 个回答

0

你做得有点多(我觉得是这样)。其实把字符串拆分并不是必要的。你只需要检查整个字符串中是否包含“@gmail.com”、“@hotmail.com”、“@mail.com”等关键词,然后给每个关键词各自加一个计数就可以了。

gmail_counter = 0
hotmail_counter = 0
mail_counter = 0
# Add as many counters as required
for email in emails:
    if email.find("@gmail.com") >= 0
        gmail_counter += 1
    elif email.find("@hotmail.com") >= 0
        hotmail_counter += 1
    elif email.find("@mail.com") >= 0
        mail_counter += 1
    # ...
2

你可以使用 collections.Counter 这个工具:

email=['me@mail.com','you@mail.com',"me@gmail.com","you@gmail.com","them@gmail.com",'you@hotmail.com',"me@hotmail.com","you@hotmail.com","them@hotmail.com"]


from collections import Counter 
def count_domains(emails):
    c = Counter()
    for email in emails:
        current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com, 
        c.update([current_email]) # wrap in list or will end up counting each letter                                     #2nd index in the list
    print(c.most_common()) # print most common domains
    print ("gmail.com count = {}".format(c["gmail.com"]))
    print ("mail.com count = {}".format(c["mail.com"]))
    print ("hotmail.com count = {}".format(c["hotmail.com"]))

print count_domains(email)

[('hotmail.com', 4), ('gmail.com', 3), ('mail.com', 2)]
gmail.com count = 3
mail.com count = 2
hotmail.com count = 4
0

如果你把所有的字符串放到一个列表里,比如叫做 myList,你可以用下面的方式让它们变得唯一,也就是说去掉重复的字符串。

uniqueList = list(set(myList))

之后,你可以用下面的方式来计算字符串出现的次数,比如说想知道第一个字符串出现了多少次。

countFirst = myList.count(uniqueList[0])

你还可以把这些东西组合在一起,比如:

[[domain,myList.count(domain)] for domain in set(myList)]

撰写回答