MongoDB聚合 - 按唯一对分组

2 投票
1 回答
1066 浏览
提问于 2025-04-18 00:39

我需要用PyMongo驱动程序将某个集合中的记录按不重复的字段对(senderrecipient)进行分组。举个例子,像(sender_field_value, recipient_field_value)(recipient_field_value, sender_field_value)这样的对被认为是相等的。

我的聚合管道

groups = base.flow.records.aggregate([
    {'$match': {'$or': [
                {'sender': _id},
                {'recipient': _id}
            ]
        }
    },
    {'$group': {
            '_id': {
                'sender': '$sender',
                'recipient': '$recipient',
            }, 
            'data_id': {
                '$max': '$_id'
            }
        }
    },            
    {'$limit': 20}
])

应用到数据上

{ "_id" : ObjectId("533950ca9c3b6222569520c2"), "recipient" : ObjectId("533950ca9c3b6222569520c1"), "sender" : ObjectId("533950ca9c3b6222569520c0") }
{ "_id" : ObjectId("533950ca9c3b6222569520c4"), "recipient" : ObjectId("533950ca9c3b6222569520c0"), "sender" : ObjectId("533950ca9c3b6222569520c1") }

产生了以下结果

{'ok': 1.0,
 'result': [
    {'_id': {'recipient': ObjectId('533950ca9c3b6222569520c0'), 'sender': ObjectId('533950ca9c3b6222569520c1')},
     'data_id': ObjectId('533950ca9c3b6222569520c4')},
    {'_id': {'recipient': ObjectId('533950ca9c3b6222569520c1'), 'sender': ObjectId('533950ca9c3b6222569520c0')},
     'data_id': ObjectId('533950ca9c3b6222569520c2')}
  ]
}

但我想要的结果其实是

{'ok': 1.0,
 'result': [
    {'_id': {'recipient': ObjectId('533950ca9c3b6222569520c0'), 'sender': ObjectId('533950ca9c3b6222569520c1')},
     'data_id': ObjectId('533950ca9c3b6222569520c4')}
  ]
}

那么,正确的管道应该是什么呢?

1 个回答

2

为了实现不同的配对分组,我们可以在 $group _id 中传入相同的“东西”来处理这两种情况。我会用普通的比较方法来做到这一点(如果你的发送者和接收者不能直接比较,你可以想出其他更适合你情况的方法,我这个方案可能就不适用了):

{$project : {
    "_id" : 1,
    "groupId" : {"$cond" : [{"$gt" : ['$sender', '$recipient']}, {big : "$sender", small : "$recipient"}, {big : "$recipient", small : "$sender"}]}
}},
{$group: {
    '_id': "$groupId",
    'data_id': {
        '$max': '$_id'
    }
}}

完整的聚合管道看起来是这样的:

{$match : {
    '$or': [{'sender': userId},{'recipient': userId}]
}},
{$project : {
    "_id" : 1,
    "groupId" : {"$cond" : [{"$gt" : ['$sender', '$recipient']}, {big : "$sender", small : "$recipient"}, {big : "$recipient", small : "$sender"}]}
}},
{$group: {
    '_id': "$groupId",
    'data_id': {
        '$max': '$_id'
    }
}},
{$limit: 20}

撰写回答