最新字典清单

2024-06-16 12:40:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一份字典清单

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-07_18-24-22", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]

我想获取已删除重复字典的列表,其中id相同,但“updated_at”是最新的

因此,我的最终清单将是:

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]

有效的方法是什么


Tags: 方法id列表字典mysummaryatlist
3条回答

两种解决方案,一种使用dict,另一种通过排序和分组:

from itertools import groupby

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-07_18-24-22", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]


def newest_id(seq):
    """Keep id with most recent updated_at

    Return a list of kept items.
    """
    td = {}
    for e in seq:
        key = e['id']
        if key not in td or td[key]['updated_at'] < e['updated_at']:
            td[key] = e
    return list(td.values())


def newest_id2(seq):
    """Keep id with most recent updated_at

    Return a sorted list of kept items.
    """
    tl = sorted(seq, key=lambda e: (e['id'], e['updated_at']), reverse=True)
    return [next(g) for _, g in groupby(tl, key=lambda e: e['id'])]


res1 = newest_id(my_list)
res2 = newest_id2(my_list)

# Check result

res1.sort(key=lambda e: e['id'], reverse=True)
print(res1 == res2)

您可以使用dict来累积项目

字典可以将id存储为键,将列表项存储为值。仅当不存在具有相同键的项时,才在字典中插入项;如果它确实比较了updated_at值,并在需要时更新字典

def generate_new_list(my_list):
    counts = {}
    for d in my_list:
        item_id = d['id']
        if item_id in counts:
            if d['updated_at'] > counts[item_id]['updated_at']:
                counts[item_id] = d
        else:
            counts[item_id] = d

    return list(counts.values())

还有几点注意:

  • 如果要保持原始顺序,请确保使用Python3.7(保证DICT按插入顺序排序)或使用OrderedDict。使用标准dict时,您必须首先弹出条目,因为替换不会更改dict顺序(因此每个项目将按照其id第一次出现的顺序输出),而ordereddict对该用例有特殊支持(移动到末尾)
  • 您还可以使用dict.get和“空对象模式”删除特殊情况:

    MISSING = {'updated_at': '0'} # pseudo-entry smaller than all possible
    def generate_new_list(my_list):
        counts = {}
        for d in my_list:
            if d['updated_at'] > counts.get(d['id'], MISSING):
                counts[d['id']] = d
    
        return list(counts.values())
    
  • 一个非dict的替代方法(尽管它在很大程度上不保存顺序)是按(id,updated_by)排序,按id分组,然后只保留最后一个条目。我不认为stdlib提供了开箱即用的最后一个操作(islice不接受负索引),所以您要么手工操作,要么首先将子条目具体化到列表中

一种方法是改变dict的结构

my_list = [
    {"id": "UU7t", "updated_at": "2020-01-06_16-40-00", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-07_18-24-22", "summary": "Renewed"},
    {"id": "i8Po", "updated_at": "2020-01-08_13-16-36", "summary": "Renewed"},
    {"id": "yT8h", "updated_at": "2020-01-13_18-24-05", "summary": "Deleted"},
    {"id": "7uYg", "updated_at": "2020-01-18_23-37-19", "summary": "Transferred"},
]

def getNewUpdated(myList):
    newList = {}
    for element in myList:
        if (element["id"] not in newList):
            newList[element["id"]] = element
        elif (element["updated_at"] >= newList[element["id"]]["updated_at"]):
            newList[element["id"]] = element
    return newList

print(getNewUpdated(my_list))

这里,我们正在重新构造dict,以便“id”是键,所有元素都是“值”,然后迭代您提供的列表以检查“id”是否已经存在于newList中,如果它存在,那么只需更新相同的记录(前提是更新时间是新的),或者添加新记录

输出如下所示:

{
 'i8Po': {'summary': 'Renewed', 'id': 'i8Po', 'updated_at': '2020-01-08_13-16-36'},
 'yT8h': {'summary': 'Deleted', 'id': 'yT8h', 'updated_at': '2020-01-13_18-24-05'},
 '7uYg': {'summary': 'Transferred', 'id': '7uYg', 'updated_at': '2020-01-18_23-37-19'},
 'UU7t': {'summary': 'Renewed', 'id': 'UU7t', 'updated_at': '2020-01-06_16-40-00'}
}

相关问题 更多 >