移除聚合中的重复值,并在Mong中设置限制

2024-06-01 01:08:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据集(示例):

{u'geometry': {u'type': u'Point', u'coordinates': [151.5162, -9.44365]}, u'_id': ObjectId('5ad70f71f2119236741ffb39'), u'type': u'Feature', u'properties': {u'POS_ID': u'592795', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:52:00.000', u'MMSI': u'636015725'}}

{u'geometry': {u'type': u'Point', u'coordinates': [119.0369, -0.3608933]}, u'_id': ObjectId('5ad70f71f2119236741ffb0d'), u'type': u'Feature', u'properties': {u'POS_ID': u'592557', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:49:00.000', u'MMSI': u'636092156'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.1707, -0.9142034]}, u'_id': ObjectId('5ad85e210b2d50e1174f5d29'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15',   u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.2707, -0.8142034]}, u'_id': ObjectId('5ad85e2c0b2d50e1174f5d2a'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000',u'MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c05b66f42caf578c45'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000' }} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c45b66f42caf578c46'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10',  u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}

我想取两个随机记录,但保留相同的彩信记录。更具体地说,正如您所看到的,最后四个记录具有相同的MMSI。如果我想获取2条随机记录,我想返回:

{u'geometry': {u'type': u'Point', u'coordinates': [119.0369, -0.3608933]}, u'_id': ObjectId('5ad70f71f2119236741ffb0d'), u'type': u'Feature', u'properties': {u'POS_ID': u'592557', u'STATUS': u'0', u'TIMESTAMP': u'2013-12-31 18:49:00.000', u'MMSI': u'636092156'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.1707, -0.9142034]}, u'_id': ObjectId('5ad85e210b2d50e1174f5d29'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15',   u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [158.2707, -0.8142034]}, u'_id': ObjectId('5ad85e2c0b2d50e1174f5d2a'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'15', u'TIMESTAMP': u'2013-12-31 17:04:00.000',MMSI': u'503551000'}} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c05b66f42caf578c45'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000' }} 

{u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'_id': ObjectId('5ad878c45b66f42caf578c46'), u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10',  u'TIMESTAMP': u'2013-12-31 17:04:00.000', u'MMSI': u'503551000'}}

第一个MMSI=636092156,第二个MMSI=503551000(4条记录)。你知道吗

在SQL中,我需要如下内容:

select from table where MMSI in (select distinct(MMSI) from table limit 2));

到目前为止,我有一个疑问:

getlimitShips = db.samplecol.aggregate([{"$lookup":{"from":"samplecol", "localField":"properties.MMSI", "foreignField":"properties.MMSI", "as":"ff"}},{ "$limit" : 97},{ "$project": {"_id":0, "ff.properties.POS_ID":0,"ff.properties.STATUS":0, "ff.properties.TIMESTAMP":0])

count_lim = 0
for limS in getlimitShips:
    print "SHIP:", limS["properties"]["MMSI"],"\n"
    count_lim = count_lim +1
    print "Record",count_lim,": ", limS,"\n"

回报:

...

...

SHIP: 503551000

Record 97 : {u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'MMSI': u'503551000', u'COURSE': u'12', u'TIMESTAMP': u'2013-12-31 17:04:00.000'}, u'ff': [{u'geometry': {u'coordinates': [141.8705, -12.67311]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [158.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [158.2707, -0.8142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}]}

SHIP: 503551000

Record 104 : {u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'MMSI': u'503551000', u'COURSE': u'10', u'TIMESTAMP': u'2013-12-31 17:04:00.000'}, u'ff': [{u'geometry': {u'coordinates': [141.8705, -12.67311]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [158.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [158.2707, -0.8142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}]}

SHIP: 503551000

Record 105 : {u'geometry': {u'type': u'Point', u'coordinates': [157.1707, -0.9142034]}, u'type': u'Feature', u'properties': {u'POS_ID': u'132856', u'STATUS': u'10', u'MMSI': u'503551000', u'COURSE': u'4', u'TIMESTAMP': u'2013-12-31 17:04:00.000'}, u'ff': [{u'geometry': {u'coordinates': [141.8705, -12.67311]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [158.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [158.2707, -0.8142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}, {u'geometry': {u'coordinates': [157.1707, -0.9142034]}, u'properties': {u'MMSI': u'503551000'}}]}

如您所见,查询返回聚合结果的次数与mongo中的记录相同。有人知道如何删除查询中的重复项并一次性返回聚合结果吗?你知道吗


Tags: posidtypestatus记录propertiestimestampfeature