当前位置: 首页 > lucene, 搜索 > 正文

分析lucene4查询score计算(二)

1 星2 星3 星4 星5 星 (1 次投票, 评分: 5.00, 总分: 5)
Loading ... Loading ...
baidu_share

接着上一篇文章分析lucene4查询score计算,上一篇文章说到,TermQuery不会去执行coord(q,d)函数。执行该函数的query主要是BooleanQuery如下图所示:
coord

    public float coord(int overlap, int maxOverlap) {
      // LUCENE-4300: in most cases of maxOverlap=1, BQ rewrites itself away,
      // so coord() is not applied. But when BQ cannot optimize itself away
      // for a single clause (minNrShouldMatch, prohibited clauses, etc), its
      // important not to apply coord(1,1) for consistency, it might not be 1.0F
      return maxOverlap == 1 ? 1F : similarity.coord(overlap, maxOverlap);
    }

当我们要在bookname字段查询”ab bc qq xq”,bookname存储的字段为”ab bc”和”bc bc”,那么overlap就为2,即只有ab和bc那个字段是需要的,可以找到的。maxOverlap为4,”ab bc qq xq”经过分词为ab,bc,qq,xq4个term。所以其crood=2/4=0.5

使用BooleanQuery会使用ConjunctionScorer的score()方法。

  public float score() throws IOException {
    // TODO: sum into a double and cast to float if we ever send required clauses to BS1
    float sum = 0.0f;
    for (DocsAndFreqs docs : docsAndFreqs) {
      sum += docs.scorer.score();
    }
    return sum * coord;
  }

测试主函数如下:

 public class LuceneScoreTest {
 
	public static void main(String[] args) throws Exception {
		test3();
 
	}
 
 
	public static void test3() throws Exception {
		Directory dir = new RAMDirectory();
		IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_45,
				new StandardAnalyzer(Version.LUCENE_45));
		IndexWriter writer = new IndexWriter(dir, config);
 
		Document doc1 = new Document();
		Document doc2 = new Document();
		Document doc3 = new Document();
		doc1.add(new StringField("id", "1", Store.YES));
		doc1.add(new TextField("bookname", "bc bc", Store.YES));
 
		// doc2.add(new StringField("id", "2", Store.YES));
		doc2.add(new TextField("bookname", "ab bc", Store.YES));
 
		// doc3.add(new StringField("id", "3", Store.YES));
		doc3.add(new TextField("bookname", "ab bc cd", Store.YES));
 
		writer.addDocument(doc1);
		writer.addDocument(doc2);
		writer.addDocument(doc3);
 
		writer.close();
		IndexReader reader = DirectoryReader.open(dir);
		IndexSearcher searcher = new IndexSearcher(reader);
 
		/*TermQuery q = new TermQuery(new Term("bookname", "bc"));
		BooleanQuery booeanQuery=new BooleanQuery();
		booeanQuery.add(q, Occur.MUST);
		booeanQuery.add(new TermQuery(new Term("id", "1")), Occur.MUST);*/
		// q.setBoost(2f);
		QueryParser qp = new QueryParser(Version.LUCENE_45, "bookname",  new StandardAnalyzer(Version.LUCENE_45));
		qp.setDefaultOperator(QueryParser.OR_OPERATOR);
		Query query = qp.parse("ab bc qq xq");
		TopDocs topdocs = searcher.search(query, 5);
		ScoreDoc[] scoreDocs = topdocs.scoreDocs;
		// System.out.println("查询结果总数---" +
		// topdocs.totalHits+"最大的评分--"+topdocs.getMaxScore());
		// 打印查询结果
		for (int i = 0; i < scoreDocs.length; i++) {
			int doc = scoreDocs[i].doc;
			Document document = searcher.doc(doc);
			System.out.println("bookname====" + document.get("bookname"));
			// System.out.println("id--" + scoreDocs[i].doc + "---scors--" +
			// scoreDocs[i].score+"---index--"+scoreDocs[i].shardIndex);
			System.out.println(searcher.explain(query, doc));//
			System.out.println(scoreDocs[i].score);
		}
		reader.close();
	}

explain 结果:

0.14666529 = (MATCH) product of:
  0.29333058 = (MATCH) sum of:
    0.19459413 = (MATCH) weight(bookname:ab in 1) [DefaultSimilarity], result of:
      0.19459413 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
        0.3113506 = queryWeight, product of:
          1.0 = idf(docFreq=2, maxDocs=3)
          0.3113506 = queryNorm
        0.625 = fieldWeight in 1, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          1.0 = idf(docFreq=2, maxDocs=3)
          0.625 = fieldNorm(doc=1)
    0.09873645 = (MATCH) weight(bookname:bc in 1) [DefaultSimilarity], result of:
      0.09873645 = score(doc=1,freq=1.0 = termFreq=1.0
), product of:
        0.22178063 = queryWeight, product of:
          0.71231794 = idf(docFreq=3, maxDocs=3)
          0.3113506 = queryNorm
        0.4451987 = fieldWeight in 1, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          0.71231794 = idf(docFreq=3, maxDocs=3)
          0.625 = fieldNorm(doc=1)
  0.5 = coord(2/4)

备注:以上lucene的版本为4.5.1

本文固定链接: http://www.chepoo.com/analysis-lucene4-query-score-calculated-2.html | IT技术精华网

分析lucene4查询score计算(二):等您坐沙发呢!

发表评论