Python: Whoosh似乎返回了不正确的结果

2 投票

1 回答

1176 浏览

提问于 2025-04-18 15:41

这段代码直接来自Whoosh的快速入门文档：

import os.path
from whoosh.index import create_in
from whoosh.fields import Schema, STORED, ID, KEYWORD, TEXT
from whoosh.index import open_dir
from whoosh.query import *
from whoosh.qparser import QueryParser

#establish schema to be used in the index
schema = Schema(title=TEXT(stored=True), content=TEXT,
                path=ID(stored=True), tags=KEYWORD, icon=STORED)

#create index directory
if not os.path.exists("index"):
    os.mkdir("index")

#create the index using the schema specified above
ix = create_in("index", schema)

#instantiate the writer object
writer = ix.writer()

#add the docs to the index
writer.add_document(title=u"My document", content=u"This is my document!",
                    path=u"/a", tags=u"first short", icon=u"/icons/star.png")
writer.add_document(title=u"Second try", content=u"This is the second example.",
                    path=u"/b", tags=u"second short", icon=u"/icons/sheep.png")
writer.add_document(title=u"Third time's the charm", content=u"Examples are many.",
                    path=u"/c", tags=u"short", icon=u"/icons/book.png")

#commit those changes
writer.commit()

#identify searcher
with ix.searcher() as searcher:

    #specify parser
    parser = QueryParser("content", ix.schema)

    #specify query -- try also "second"
    myquery = parser.parse("is")

    #search for results
    results = searcher.search(myquery)

    #identify the number of matching documents
    print len(results)

我只是把一个值，也就是动词“is”，传给了parser.parse()这个调用。但是当我运行这个代码时，得到的结果长度是零，而不是我预期的长度为二。如果我把“is”换成“second”，我得到一个结果，这也是我预期的。那么，为什么用“is”搜索却没有匹配的结果呢？

编辑

正如@Philippe所指出的，Whoosh的默认索引器会去掉一些常用词，所以才会出现上面描述的情况。如果你想保留这些常用词，可以在给索引的某个字段建立索引时，指定你想使用的分析器，并且可以给分析器传一个参数，让它不去掉常用词；例如：

schema = Schema(title=TEXT(stored=True, analyzer=analysis.StandardAnalyzer(stoplist=None)))

解析器搜索引擎文本分析 Whoosh 自定义分析器索引器结果匹配常用词

1 个回答

默认的文本分析器会应用一个停用词过滤器。停用词就是一些在文本处理中不太重要的词，比如“的”、“是”、“在”等。这个过滤器的作用就是把这些词给过滤掉，以便更好地分析文本内容。

你可以查看相关的文档了解更多信息： http://whoosh.readthedocs.org/en/latest/api/analysis.html#whoosh.analysis.StopFilter

回答于 2025-04-18 由 Python大师

分享举报

Python: Whoosh似乎返回了不正确的结果

编辑

1 个回答

撰写回答