Elastic没有找到句末带点的最后一个词

2024-05-12 13:51:35 发布

男 | 程序猿一只，喜欢编程写python代码。

我使用的弹性与下列设置：

ES = {
"mappings": {
    ES_DOC_TYPE: {
        "properties": {
            "message": {
                "type": "string",
                "analyzer": "liza_analyzer",
                "include_in_all": False
            }
        }
    }
},
"settings": {
    "number_of_shards": 4,
    "analysis": {
        "tokenizer": {
            "liza_tokenizer": {
                "type": "pattern",
                "pattern": r"(\. )|[\s,\[\]\(\)\"\!\'\?\`\*\;\:\/<>«»\#]+",
                "flags": "UNICODE_CASE"
            }
        },
        "analyzer": {
            "liza_analyzer": {
                "type": "custom",
                "tokenizer": "liza_tokenizer",
                "filter": ["lowercase"]
            }
        },
    }
}
}

当我试着在一句话“hello world”中找到一个单词“hello”时，弹性体就找到了。你知道吗

当我试着在一个句子“你好”中找到一个单词“你好”。“世界”，橡皮筋找到了它。你知道吗

当我试图在一个句子“hello”中找到一个单词“hello”时，弹性体也会找到它。你知道吗

但是当我试着在一个句子“hello”中找到“hello”这个词时（结尾有点），弹性体就找不到了。你知道吗

同时，最后两个句子的标记看起来像

{
"tokens": [{
    "token": "hello",
    "start_offset": 0,
    "end_offset": 5,
    "type": "<ALPHANUM>",
    "position": 0
}]
}

（完全相同）

问题是：为什么会这样？我该怎么修？你知道吗

Tags： hello doc es type 单词 analyzer 句子 tokenizer

1条回答

网友

1楼 · 发布于 2024-05-12 13:51:35

你的模式是错误的。应该是：

"pattern": "(\.\s*)|[\s,\[\]\(\)\"\!\'\?\`\*\;\:\/<>«»\#]+"

Elastic没有找到句末带点的最后一个词

相关问题更多 >

编程相关推荐

热门问题

热门文章

Elastic没有找到句末带点的最后一个词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >