Elasticsearch返回带搜索的语音标记

request_body = { 'settings': { 'index': { 'analysis': { 'analyzer': { 'metaphone_analyzer': { 'tokenizer': 'standard', 'filter': [ 'ascii_folding_filter', 'lowercase', 'metaphone_filter' ] } }, 'filter': { 'metaphone_filter': { 'type': 'phonetic', 'encoder': 'metaphone', 'replace': False }, 'ascii_folding_filter': { 'type': 'asciifolding', 'preserve_original': True } } } } }, 'mappings': { 'person_name': { 'properties': { 'full_name': { 'type': 'text', 'fields': { 'metaphone_field': { 'type': 'string', 'analyzer': 'metaphone_analyzer' } } } } } } } res = es.indices.create(index="my_index", body=request_body)

es.search(index="my_index", body={ "size": 5, "query": { "multi_match": { "query": "Jon Doe", "fields": "*_field" } } })

{ 'took': 1, 'timed_out': False, '_shards': { 'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0 }, 'hits': { 'total': 1, 'max_score': 0.77749264, 'hits': [{ '_index': 'my_index', '_type': 'person_name', '_id': 'AWwYjl4Mqo63y_hLp5Yl', '_score': 0.77749264, '_source': { 'full_name': 'John Doe' } }] } }

1条回答

网友

1楼 · 发布于 2024-05-15 14:49:11

在Elasticsearch查询中实现它看起来并不容易，但是您可以尝试启用^{}和{a2}，而{a4}可能会很方便。方法如下。在

从任意查询中检索令牌

如果您想了解Elasticsearch是如何将查询标记化的，那么Analyze API是一个很好的工具。在

使用映射可以执行以下操作，例如：

GET myindex/_analyze
{
  "analyzer": "metaphone_analyzer",
  "text": "John Doe"
}

结果就是这样：

^{pr2}$

这在技术上是一个不同的查询，但可能仍然有用。在

从文档的字段检索令牌

理论上，我们可以尝试从与我们的查询匹配的文档中检索分析上一节返回的API的完全相同的标记。在

实际上，Elasticsearch不会存储刚刚分析过的^{}字段的标记：^{}在默认情况下被禁用。我们需要启用它：

PUT /myindex
{
  "mappings": {
    "person_name": {
      "properties": {
        "full_name": {
          "fields": {
            "metaphone_field": {
              "type": "text", 
              "analyzer": "metaphone_analyzer",
              "fielddata": true
            }
          }, 
          "type": "text"
        }
      }
    }
  }, 
  "settings": {
    ...
  }
}

现在，我们可以使用scripted fields请求Elasticsearch返回这些令牌。在

查询可能如下所示：

POST myindex/_search
{
  "script_fields": {
    "my tokens": {
      "script": {
        "lang": "painless",
        "source": "doc[params.field].values",
        "params": {
          "field": "full_name.metaphone_field"
        }
      }
    }
  }
}

反应应该是这样的：

{
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "myindex",
        "_type": "person_name",
        "_id": "123",
        "_score": 1,
        "fields": {
          "my tokens": [
            "JN",
            "T",
            "doe",
            "john"
          ]
        }
      }
    ]
  }
}

如您所见，完全相同的令牌（但以随机顺序）。在

我们是否也可以检索文档中这些标记的位置信息？在

检索令牌及其位置

term vectors可能有帮助。为了能够使用它们，我们实际上不需要启用fielddata。我们可以查找文档的术语向量：

GET myindex/person_name/123/_termvectors
{
  "fields" : ["full_name.metaphone_field"],
  "offsets" : true,
  "positions" : true
}

这将返回如下内容：

{
  "_index": "myindex",
  "_type": "person_name",
  "_id": "123",
  "_version": 1,
  "found": true,
  "took": 1,
  "term_vectors": {
    "full_name.metaphone_field": {
      "field_statistics": {
        "sum_doc_freq": 4,
        "doc_count": 1,
        "sum_ttf": 4
      },
      "terms": {
        "JN": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4
            }
          ]
        },
        "T": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 5,
              "end_offset": 8
            }
          ]
        },
        "doe": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 5,
              "end_offset": 8
            }
          ]
        },
        "john": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 4
            }
          ]
        }
      }
    }
  }
}

这提供了一种获取文档字段的标记的方法，就像分析器生成它们一样。在

不幸的是，据我所知，无法将这三个查询合并为一个查询。另外，fielddata应该谨慎使用，因为它占用大量内存。在

希望这有帮助！在

从任意查询中检索令牌

从文档的字段检索令牌

检索令牌及其位置

相关问题更多 >

编程相关推荐

热门问题

热门文章