FormulasOfDFRModels

In all Terrier's implemented DivergenceFromRandomness (DFR) models, the relevance score of a document d for a query Q is given by

http://ir.dcs.gla.ac.uk/terrier/images/DFRFramework.png

The latex source code of the above formula is as follows:

\begin{equation}\label{eDFRFramework}

score(d, Q)=\sum_{t\in Q}qtw\cdot w(t,d)

\end{equation}

where t is a query term in Q. qtw is the query term weight that is given by qtf/qtfmax. qtf is the query term frequency. qtfmax is the maximum qtf among all the query terms. w(t,d) is the weight of document d for a query term t. It is given by the DFR models described below.

[http://ir.dcs.gla.ac.uk/terrier/doc/javadoc/ DLH]

http://ir.dcs.gla.ac.uk/terrier/images/DLH.PNG

Latex source code of the HypergeometricModel (DLH) model is as follows:

\begin{equation}\label{eDLH}

\frac{1}{tf+0.5}\cdot\bigg(\log_2(\frac{tf\cdot avg\_l}{l}\cdot\frac{N}{F})+(l-tf)\log_2(1-f)+0.5\log_2\big(2\pi tf(1-f)\big)\bigg)

\end{equation}

BB2

http://ir.dcs.gla.ac.uk/terrier/images/BB2.PNG

Latex source code of the [WWW] BB2 model is as follows:

\begin{equation}\label{eBB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(-\log_2(N-1)-\log_2(e)+f(N+F-1,N+F-tfn-2)-f(F,F-tfn)\big)

\end{equation}

PL2

http://ir.dcs.gla.ac.uk/terrier/images/PL2.PNG

Latex source code of the [WWW] PL2 model is as follows:

\begin{equation}\label{e}

\frac{1}{tfn+1}\big(tfn\cdot\log_2\frac{tfn}{\lambda}+(\lambda+\frac{1}{12\cdot tfn}-tfn)\cdot\log_2e+0.5\cdot\log_2(2\pi\cdot tfn)\big)

\end{equation}

I(n)L2

http://ir.dcs.gla.ac.uk/terrier/images/InL2.PNG

Latex source code of the I(n)L2 model is as follows:

\begin{equation}\label{eInL2}

\frac{1}{tfn+1}\big(tfn\cdot\log_2\frac{N+1}{n_t+0.5}\big)

\end{equation}

I(F)B2

http://ir.dcs.gla.ac.uk/terrier/images/IFB2.PNG

Latex source code of the [WWW] I(F)B2 model is as follows:

\begin{equation}\label{eIFB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(tfn\cdot\log_2\frac{N+1}{F+0.5}\big)

\end{equation}

In(exp)B2

http://ir.dcs.gla.ac.uk/terrier/images/InexpB2.PNG

Latex source code of the [WWW] In(exp)B2 model is as follows:

\begin{equation}\label{eInexpB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(tfn\cdot\log_2\frac{N+1}{n_e+0.5}\big)

\end{equation}

In(exp)C2

http://ir.dcs.gla.ac.uk/terrier/images/InexpC2.PNG

Latex source code of the [WWW] In(exp)C2 model is as follows:

\begin{equation}\label{e}

\frac{F+1}{n_t\cdot(tfn_e+1)}\big(tfn_e\cdot\log_2\frac{N+1}{n_e+0.5}\big)

\end{equation}

Notations

tf is the within-document frequency of t in d.

avg_l is the average document length in the collection.

l is the document length of d, which is the number of tokens in d.

N is the number of document in the whole collection.

F is the term frequency of t in the whole collection.

nt is the document frequency of t.

tfn is the normalised term frequency. It is given by the normalisation 2:

http://ir.dcs.gla.ac.uk/terrier/images/Norm2.PNG

The latex source code of the normalisation 2 is as follows:

\begin{equation}\label{eNormalisation2}

\end{equation}

where c is a free parameter.

tfne is also the normalised term frequency. It is given by a modified version of the normalisation 2:

http://ir.dcs.gla.ac.uk/terrier/images/Norm2e.PNG

The latex source code of the modified normalisation 2 is as follows:

\begin{equation}\label{eNormalisation2e}

\end{equation}

where c is a free parameter.

λ is the variance and mean of a Poisson distribution. It is given by F/N and F is much smaller than N.

ne is given by N(1-(1-nt/N)F).

The relation f is given by the Stirling formula:

f(n,m)=(m+0.5)log2(n/m)+(n-m)log2n