whoosh是否需要所有字符串为unicode？

3 投票

1 回答

2123 浏览

提问于 2025-04-16 22:38

我正在把我的搜索应用从Solr转到Whoosh，现在正在学习快速入门。但是每次处理字符串时，我总是遇到问题。

>>>writer.add_document(iden=fil, content=F2T.file_to_text(fil_path))

出现了这个错误：ValueError: 'File Name.doc' is not unicode or sequence

然后：

>>>query = QueryParser("content", ix.schema).parse("first")
AssertionError: 'first' is not unicode

这行代码直接来自快速入门教程！Whoosh是不是要求所有字段都必须是unicode格式？要让我的应用支持unicode会很麻烦（而且其实也没必要）。至于“不是unicode或序列”，我知道字符串也是一种序列数据类型。

错误处理字符串处理 unicode 数据类型应用迁移搜索引擎 Whoosh

1 个回答

是的，这需要字符串使用Unicode编码。

 query = QueryParser("content", ix.schema).parse("first")

把它改成：

query = QueryParser("content", ix.schema).parse(u"first")

回答于 2025-04-16 由 Python大师

分享举报