如何统计邮件中的相似域名并仅打印每个域名一次[python]?
我有一个包含10个hotmail邮箱、4个gmail邮箱和3个mail.com邮箱的数据集。我想分析这些邮箱,统计每个域名(比如hotmail、gmail等)的数量,并把结果打印出来。不过我现在的做法有点粗暴。
我知道用Python可以写出简洁优雅的代码(比如用itertools、islice、xrange等)。
我想要的结果是:
hotmail: 10
gmail: 4
mail.com: 3
但是我得到的结果是:
hotmail
10
hotmail
10
...
hotmail
10
gmail
4
gmail
4
gmail
4
gmail
4
等等
def count_domains( emails):
for email in emails:
current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com,
#2nd index in the list
print(current_email)
current_domain_counter = 0
for email2 in emails:
if current_email == email2.split("@",2)[1]:
current_domain_counter = current_domain_counter + 1
#print(current_email current_domain_counter)
print(current_domain_counter)
3 个回答
0
你做得有点多(我觉得是这样)。其实把字符串拆分并不是必要的。你只需要检查整个字符串中是否包含“@gmail.com”、“@hotmail.com”、“@mail.com”等关键词,然后给每个关键词各自加一个计数就可以了。
gmail_counter = 0
hotmail_counter = 0
mail_counter = 0
# Add as many counters as required
for email in emails:
if email.find("@gmail.com") >= 0
gmail_counter += 1
elif email.find("@hotmail.com") >= 0
hotmail_counter += 1
elif email.find("@mail.com") >= 0
mail_counter += 1
# ...
2
你可以使用 collections.Counter 这个工具:
email=['me@mail.com','you@mail.com',"me@gmail.com","you@gmail.com","them@gmail.com",'you@hotmail.com',"me@hotmail.com","you@hotmail.com","them@hotmail.com"]
from collections import Counter
def count_domains(emails):
c = Counter()
for email in emails:
current_email = email.split("@", 2)[1] # splits at @, john@mail.com => mail.com,
c.update([current_email]) # wrap in list or will end up counting each letter #2nd index in the list
print(c.most_common()) # print most common domains
print ("gmail.com count = {}".format(c["gmail.com"]))
print ("mail.com count = {}".format(c["mail.com"]))
print ("hotmail.com count = {}".format(c["hotmail.com"]))
print count_domains(email)
[('hotmail.com', 4), ('gmail.com', 3), ('mail.com', 2)]
gmail.com count = 3
mail.com count = 2
hotmail.com count = 4
0
如果你把所有的字符串放到一个列表里,比如叫做 myList,你可以用下面的方式让它们变得唯一,也就是说去掉重复的字符串。
uniqueList = list(set(myList))
之后,你可以用下面的方式来计算字符串出现的次数,比如说想知道第一个字符串出现了多少次。
countFirst = myList.count(uniqueList[0])
你还可以把这些东西组合在一起,比如:
[[domain,myList.count(domain)] for domain in set(myList)]