TF-IDF is a classical information retrieval term weigthing model, which estimates the importance of a term in a given document by multiplying the raw term frequency (TF) of the term in a document by the term's inverse document frequency (IDF) weight:

idf_k = log (NDoc / D_k)

w_kd = f_kd.idf_k

where f_kd is the frequency with which keyword k occurs in document d, NDoc is the total number of documents containing keyword k.

There are many variants of TF-IDF depending on whether TF is normalised and/or how IDF is estimated.

Terrier; [WWW]; implements the TF-IDF weighting model as a combination of the Okapi's TF (Roberston and al., TREC 4, 1995) and Sparck- Jones' IDF (Sparck-Jones, JoD, 1972)

For more information about classical term weighting models could be found in the following paper:

G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24 (5):513--523, 1988.

last edited 2007-03-27 11:33:19 by ErikGraf