在Python字典中按唯一键计数唯一值

0 投票
2 回答
1348 浏览
提问于 2025-04-28 02:28

我有一个字典,长得像这样:

yahoo.com|98.136.48.100
yahoo.com|98.136.48.105
 yahoo.com|98.136.48.110
 yahoo.com|98.136.48.114
 yahoo.com|98.136.48.66
 yahoo.com|98.136.48.71
 yahoo.com|98.136.48.73
 yahoo.com|98.136.48.75
 yahoo.net|98.136.48.100
g03.msg.vcs0|98.136.48.105

里面有重复的键和值。我想要的是一个最终的字典,里面的键(也就是ip)是唯一的,值(也就是域名)是唯一值的数量。我已经写了下面的代码:

for dirpath, dirs, files in os.walk(path):
    for filename in fnmatch.filter(files, '*.txt'):
        with open(os.path.join(dirpath, filename)) as f:
            for line in f:
                if line.startswith('.'):
                    ip = line.split('|',1)[1].strip('\n')
                    semi_domain = (line.rsplit('|',1)[0]).split('.',1)[1]
                    d[ip]= semi_domains
                    if ip not in d:
                        key = ip
                        val = [semi_domain]
                        domains_per_ip[key]= val

但是这个代码运行得不太对。有人能帮我解决这个问题吗?

暂无标签

2 个回答

0

你可以用 zip 函数把两个列表里的 ipsdomains 分开,然后用 set 来获取唯一的条目!

>>>f=open('words.txt','r').readlines()
>>> zip(*[i.split('|') for i in f])
[('yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.com', 'yahoo.net', 'g03.msg.vcs0'), ('98.136.48.100\n', '98.136.48.105\n', '98.136.48.110\n', '98.136.48.114\n', '98.136.48.66\n', '98.136.48.71\n', '98.136.48.73\n', '98.136.48.75\n', '98.136.48.100\n', '98.136.48.105')]
>>> [set(dom) for dom in zip(*[i.split('|') for i in f])]
[set(['yahoo.com', 'g03.msg.vcs0', 'yahoo.net']), set(['98.136.48.71\n', '98.136.48.105\n', '98.136.48.100\n', '98.136.48.105', '98.136.48.114\n', '98.136.48.110\n', '98.136.48.73\n', '98.136.48.66\n', '98.136.48.75\n'])]

接着用 len 就能找到唯一对象的数量! 这一切都可以用一行代码完成,使用列表推导式

>>> [len(i) for i in [set(dom) for dom in zip(*[i.split('|') for i in f])]]
[3, 9]
0

使用一个默认字典:

from collections import defaultdict

d = defaultdict(set)

with open('somefile.txt') as thefile:
   for line in the_file:
      if line.strip():
          value, key = line.split('|')
          d[key].add(value)

for k,v in d.iteritems():  # use d.items() in Python3
    print('{} - {}'.format(k, len(v)))

撰写回答