Language modeling (LM) computes the probability of a query-term within a document by smoothing the maximum likelihood estimate (MLE) of the within-document term-frequency, tf/length(d), with the relative term-frequency within the collection, TF/FreqTotColl.

Smoothing can be obtained by either mixing these two probabilities or extracting the MLE from the compounding of the multinomial distribution with Dirichlet's Priors.

Terrier provides implementations of Hiemstra's query likelihood, and Dirichlet.

last edited 2010-03-03 17:19:23 by CraigMacdonald