有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何使用lucene计算单词支持度和可信度

我想在Java应用程序中使用Lucene来计算单词支持度和置信度。 我有500多个。txt文档,ArrayList包含两个术语,术语i和术语j

The formula for counting Confidence

Dti-tj/Dti

Dti-tj: Total document contains term i,term j
Dti : Total document contains term i

The formula for counting Support

Dti-tj/D

Dti-tj = Total document contains term i,term j
D = Total Document in the collection

是否可以使用Lucene搜索和计算单词? 我要上什么课


共 (1) 个答案

  1. # 1 楼答案

    我只需搜索您的两个术语,termi和termj,并从搜索返回的totalHits中获取您的计数

    int docCount = indexReader.numDocs();
    IndexSearcher searcher = new IndexSearcher(indexReader);
    
    Query queryI = new TermQuery(new Term(fieldName, termI));
    Query queryJ = new TermQuery(new Term(fieldName, termJ));
    
    Query queryIJ = new BooleanQuery();
    queryIJ.add(new BooleanClause(queryI, BooleanClause.Occur.SHOULD));
    queryIJ.add(new BooleanClause(queryJ, BooleanClause.Occur.SHOULD));
    
    int countI = searcher.search(nqueryI, maxDocs).totalHits;
    int countIJ = searcher.search(, maxDocs).totalHits;
    
    float confidence = (float)countIJ / (float)countI;
    float support = (float)countIJ / (float)docCount;