需要从字典列表中删除重复项并修改剩余重复项的数据(python)
考虑一下这个简单的 Python 字典列表(第一个字典项是一个字符串,第二个项是一个 Widget 对象):
raw_results =
[{'src': 'tag', 'widget': <Widget: to complete a form today>}, # dupe 1a
{'src': 'tag', 'widget': <Widget: a newspaper>}, # dupe 2a
{'src': 'zip', 'widget': <Widget: to complete a form today>}, # dupe 1b
{'src': 'zip', 'widget': <Widget: the new Jack Johnson album>},
{'src': 'zip', 'widget': <Widget: a newspaper>}, # dupe 2b
{'src': 'zip', 'widget': <Widget: premium dog food >}]
我想遍历这个列表,去掉重复的项,这个 StackOverflow 的问题给了我答案:
known_widgets= set()
processed_results = []
for x in raw_results:
widget = x['widget']
if widget in known_widgets:
continue
else:
processed_results.append(x)
known_widgets.add(widget)
不过,在我去掉重复的行(比如说 dupe 1b)后,我想要修改剩下的重复项(比如说 dupe 1a)的 "src" 数据。我想把被去掉的重复项的 "src" 添加到原来的项中。这是我想要的结果:
processed_results =
[{'src': 'tag-zip', 'widget': <Widget: to complete a form today>}, # dupe 1a
{'src': 'tag-zip', 'widget': <Widget: a newspaper>}, # dupe 2a
{'src': 'zip', 'widget': <Widget: the new Jack Johnson album>},
{'src': 'zip', 'widget': <Widget: premium dog food >}]
我相信这很简单,但我喝了太多咖啡,思路有点混乱,已经花了很多小时在这个问题上打转。我非常希望能得到专家的帮助,感谢你们!
2 个回答
1
假设你想要根据重复的src值来整理一个小部件的列表,这就是你需要的做法:
class Widget(object):
def __init__(self, desc):
self.desc = desc
def __str__(self):
return "Widget(%s)" % self.desc
raw_results = [
{'src':'tag-zip', 'widget':Widget('to complete a form today')},
{'src':'tag-zip', 'widget':Widget('a newspaper')},
{'src':'zip', 'widget':Widget('the new Jack Johnson album')},
{'src':'zip', 'widget':Widget('premium dog food')}
]
from collections import defaultdict
known_widgets = defaultdict(list)
for x in raw_results:
k, v = x['src'], x['widget']
known_widgets[k].append(v)
for k, v in known_widgets.iteritems():
print "%s: %s" % (k, ",".join(str(w) for w in v))
如果你想要去掉重复的widget5,那就这样做:
class Widget(object):
def __init__(self, desc):
self.desc = desc
def __str__(self):
return "Widget(%s)" % self.desc
def __hash__(self):
return hash(self.desc)
def __cmp__(self, other):
return cmp(self.desc, other.desc)
raw_results = [
{'src':'tag-zip', 'widget':Widget('to complete a form today')},
{'src':'tag-zip', 'widget':Widget('a newspaper')},
{'src':'zip', 'widget':Widget('the new Jack Johnson album')},
{'src':'zip', 'widget':Widget('premium dog food')},
{'src':'tag-zip', 'widget':Widget('to complete a form today')},
{'src':'tag-zip', 'widget':Widget('a newspaper')},
{'src':'zip', 'widget':Widget('the new Jack Johnson album')},
{'src':'zip', 'widget':Widget('premium dog food')},
]
from collections import defaultdict
known_widgets = defaultdict(set)
for x in raw_results:
k, v = x['src'], x['widget']
known_widgets[k].add(v)
for k, v in known_widgets.iteritems():
print "%s: %s" % (k, ",".join(str(w) for w in v))
2
def find_widget(widget, L):
for i, v in enumerate(L):
if v[widget] == widget:
return i
known_widgets= set()
processed_results = []
for x in raw_results:
widget = x['widget']
if widget in known_widgets:
processed_widgets[find_widget(widget, processed_results)]['src'] += '-%s' % x['tag']
continue
else:
processed_results.append(x)
known_widgets.add(widget)
这个方法可能还有更好的做法,因为它对每个重复的组件进行了两次处理。