MongoDB:从投影中删除重复记录
我该如何从MongoDB的查询结果中去掉重复的记录呢?假设我的MongoDB文档是这样的 -
{"_id":"55555454", "From":"Bob", "To":"Alice", "subject":"Hi", "date":"04102011"}
{"_id":"55555455", "From":"Bob", "To":"Dave", "subject":"Hello", "date":"04102014"}
{"_id":"55555456", "From":"Bob", "To":"Alice", "subject":"Bye", "date":"04112013"}
当我进行简单的查询时,使用这个命令:db.col.find({}, {"From":1, "To":1, "_id":0})
这条命令会返回所有三条记录,结果如下:
{"From":"Bob", "To":"Alice"} {"From":"Bob","To":"Dave"} {"From":"Bob", "To":"Alice"}
但是我想要的结果只有两条记录,应该是这样的 -
{"From":"Bob", "To":"Alice"} {"From":"Bob","To":"Dave"}
因为我的应用程序现在是用Python写的(使用pymongo库),所以我在应用程序中通过以下方式从记录列表中去掉重复的记录:
result = [dict(tupleized) for tupleized in set(tuple(item.items()) for item in l)]
有没有什么数据库的方法可以直接在查询中处理,返回我想要的两条记录呢?
2 个回答
0
投影只是定义你想在结果中显示哪些字段。这就像是以以下语句开始:
SELECT From, To
与基本形式相比:
SELECT *
所以你实际上想做的事情相当于这个:
db.collection.find(
{ "From": "Bob", "To": "Alice" },
{ "From": 1, "To": 1 }
)
这实际上是选择你想要的记录,形式和以下内容很相似:
SELECT From, To
FROM collection
WHERE
From = "Bob"
AND To = "Alice"
如果这样做真的产生了“重复”的结果,你可以通过使用聚合来去除这些重复:
db.collection.aggregate([
{ "$match": {
"From": "Bob", "To": "Alice"
}}
{ "$group": {
"_id": {
"From": "$From", "To": "$To"
}
}}
])
1
你不能仅仅通过使用 find
和投影来在 MongoDB 中进行去重和减少重复文档。
find
命令的工作方式是,它会返回一个游标给客户端,因此不能直接把结果减少到只有唯一的文档,而是需要进行第二次处理。
使用以下测试数据(去掉了 _id
):
> db.test.find()
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Hi", "date" : "04102011" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "Bob", "To" : "Dave", "subject" : "Hello", "date" : "04102014" }
{ "From" : "Bob", "To" : "Alice", "subject" : "Bye", "date" : "04112013" }
{ "From" : "George", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
{ "From" : "David", "To" : "Carl", "subject" : "Bye", "date" : "04112013" }
你可以使用聚合操作:
> db.test.aggregate({ $group: { _id: { "From": "$From", "To": "$To" }}})
结果:
{
"result" : [
{
"_id" : {
"From" : "David",
"To" : "Carl"
}
},
{
"_id" : {
"From" : "George",
"To" : "Carl"
}
},
{
"_id" : {
"From" : "Bob",
"To" : "Dave"
}
},
{
"_id" : {
"From" : "Bob",
"To" : "Alice"
}
}
],
"ok" : 1
}
这个 Python 代码看起来应该和上面提到的聚合管道非常相似。