在Elasticsearch中排序大量文档

0 投票
1 回答
1007 浏览
提问于 2025-04-19 20:45

当我想从elasticsearch索引中获取大量文档时,我总是使用elasticsearch的扫描和滚动技术,具体方法如下:

conn = Elasticsearch( hosts = HOSTS )

the_query = { 'query': { 'match_all': {  } }, 'sort': { 'created_at': { 'order': 'asc' } } } # would like sort the documents according to the 'created_at' date

scanResp = conn.search( index=TARGET_INDEX, doc_type=TARGET_DOC_TYPE, body=the_query, search_type='scan', scroll='10m' )
scrollId = scanResp['_scroll_id']
doc_num = 1

response = conn.scroll( scroll_id = scrollId, scroll='10m' )

while ( len( response['hits']['hits'] ) > 0 ):
    for item in response['hits']['hits']:
        print '\tDocument ' + str(doc_num) + ' of ' + str( response['hits']['total'] )
        doc_num += 1

        # ====================
        #   Process the item
        # ====================
        the_doc = item['_source']


    # end for item
    scrollId = response['_scroll_id']
    if doc_num >= response['hits']['total']:
        break
    response = conn.scroll( scroll_id = scrollId, scroll='10m' )
# end of while

不过,正如elasticsearch的文档所提到的,获取到的文档是没有排序的,所以结果并不是我想要的。

我的问题是:如何在Elasticsearch中对大量文档进行排序呢?

谢谢 :)

1 个回答

1

在浏览一个已经排好序的列表时,滚动会消耗很多资源。不过,如果你还是想这么做,可以在你的查询中去掉'scan'这个搜索类型。因为'scan'会在你滚动时禁用排序功能。

撰写回答