java解决了从Lucene 4.0升级到4.1后的糟糕性能

12 月 Questions & Answers 914

从Lucene 4.0升级到4.1后，我的解决方案的性能下降了一个数量级以上。直接原因是存储字段的无条件压缩。目前，我正在回归4.0，但这显然不是前进的方向；我希望找到一种不同的解决方法

我使用Lucene作为数据库索引，这意味着我存储的字段非常短：最多只有几个单词

我使用CustomScoreQuery，在CustomScoreProvider#customScore中，我加载所有候选文档，并对查询执行详细的单词相似性评分。我使用了两个级别的启发式方法来缩小候选文档集（基于Dice's coefficient），但在最后一步中，我需要将每个查询词与每个文档词进行匹配（它们的顺序可能不同），并根据最佳匹配词的和来计算总分

如何以不同的方式进行计算，避免在查询求值期间加载压缩字段的陷阱

# 1 楼答案

和Lucene 3。我有这个：

new CustomScoreQuery(bigramQuery, new FieldScoreQuery("bigram-count", Type.BYTE)) {
  protected CustomScoreProvider getCustomScoreProvider(IndexReader ir) {
    return new CustomScoreProvider(ir) {
      public double customScore(int docnum, float bigramFreq, float docBigramCount) {
         ... calculate Dice's coefficient using bigramFreq and docBigramCount...
         if (diceCoeff >= threshold) {
           String[] stems = ir.document(docnum).getValues("stems");
           ... calculate document similarity score using stems ...
         }
      }
    };
  }
}

这种方法允许从存储字段中高效地检索缓存的float值，我用它来获取文档的bigram计数；它不允许检索字符串，所以我需要加载文档以获得计算文档相似性分数所需的内容。在Lucene 4.1更改为压缩存储字段之前，它一直工作正常

利用Lucene 4中的增强功能的正确方法是如下所示：

new CustomScoreQuery(bigramQuery) {
  protected CustomScoreProvider getCustomScoreProvider(ReaderContext rc) {
    final AtomicReader ir = ((AtomicReaderContext)rc).reader();
    final ValueSource 
       bgCountSrc = ir.docValues("bigram-count").getSource(),
       stemSrc = ir.docValues("stems").getSource();
    return new CustomScoreProvider(rc) {
      public float customScore(int docnum, float bgFreq, float... fScores) {
        final long bgCount = bgCountSrc.getInt(docnum);
        ... calculate Dice's coefficient using bgFreq and bgCount ...
        if (diceCoeff >= threshold) {
          final String stems = 
             stemSrc.getBytes(docnum, new BytesRef())).utf8ToString();
          ... calculate document similarity score using stems ...
        }
      }
    };
  }
}

这使得性能从16毫秒（Lucene 3.x）提高到了10毫秒（Lucene 4.x）

Python中文网

有 Java 编程相关的问题?

java解决了从Lucene 4.0升级到4.1后的糟糕性能

共 (1) 个答案

# 1 楼答案