The term frequency normalisation smoothes the dependence between the WithinDocumentTermFrequency and the DocumentLength.


A. Singhal, C. Buckley and M. Mitra. Pivoted Document Length Normalization. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Pages 21 - 29. Zurich, Switzerland. 1996,

Singhal et. al. give the following two reasons for the need for the term frequency normalisation:

As a consequence, an information retrieval system without term frequency normalisation produces biased results and favours long documents.

Both DivergenceFromRandomness models and BM25 employ the term frequency normalisation components in their formulas.

last edited 2005-05-01 15:58:03 by CraigMacdonald