The DivergenceFromRandomness models apply the normalisation 2 for the TermFrequencyNormalisation.

The normalisation 2 assumes a decreasing term frequency density of DocumentLength. The formula of the normalisation 2 is as follows:

tfn = tf * log(1 + c*(sl/dl))

where tfn and tf are the normalised and original term frequency of the query term in the document. sl is the average document length in the whole collection. dl is the Document Length. The parameter c can be set automatically, as described by He and Ounis 'Term Frequency Normalisation Tuning for BM25 and DFR model', in Proceedings of ECIR'05, 2005.

last edited 2005-05-02 12:16:25 by IadhOunis