## The ["DivergenceFromRandomness"] (DFR) Framework & ["Terrier"]

In all Terrier's implemented DFR models, the relevance score of a document d for a query Q is given by

The latex source code of the above formula is as follows:

\begin{equation}\label{eDFRFramework}

score(d, Q)=\sum_{t\in Q}qtw\cdot w(t,d)

\end{equation}

where t is a query term in Q. qtw is the query term weight that is given by qtf/qtf_{max}. qtf is the query term frequency. qtf_{max} is the maximum qtf among all the query terms. w(t,d) is the weight of document d for a query term t. It is given by the DFR models described below.

### [http://ir.dcs.gla.ac.uk/terrier/doc/javadoc/ DLH]

Latex source code of the HypergeometricModel (DLH) model is as follows:

\begin{equation}\label{eDLH}

\frac{1}{tf+0.5}\cdot\bigg(\log_2(\frac{tf\cdot avg\_l}{l}\cdot\frac{N}{F})+(l-tf)\log_2(1-f)+0.5\log_2\big(2\pi tf(1-f)\big)\bigg)

\end{equation}

### BB2

Latex source code of the BB2 model is as follows:

\begin{equation}\label{eBB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(-\log_2(N-1)-\log_2(e)+f(N+F-1,N+F-tfn-2)-f(F,F-tfn)\big)

\end{equation}

### PL2

Latex source code of the PL2 model is as follows:

\begin{equation}\label{e}

\frac{1}{tfn+1}\big(tfn\cdot\log_2\frac{tfn}{\lambda}+(\lambda+\frac{1}{12\cdot tfn}-tfn)\cdot\log_2e+0.5\cdot\log_2(2\pi\cdot tfn)\big)

\end{equation}

### I(n)L2

Latex source code of the I(n)L2 model is as follows:

\begin{equation}\label{eInL2}

\frac{1}{tfn+1}\big(tfn\cdot\log_2\frac{N+1}{n_t+0.5}\big)

\end{equation}

### I(F)B2

Latex source code of the I(F)B2 model is as follows:

\begin{equation}\label{eIFB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(tfn\cdot\log_2\frac{N+1}{F+0.5}\big)

\end{equation}

### In(exp)B2

Latex source code of the In(exp)B2 model is as follows:

\begin{equation}\label{eInexpB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(tfn\cdot\log_2\frac{N+1}{n_e+0.5}\big)

\end{equation}

### In(exp)C2

Latex source code of the In(exp)C2 model is as follows:

\begin{equation}\label{e}

\frac{F+1}{n_t\cdot(tfn_e+1)}\big(tfn_e\cdot\log_2\frac{N+1}{n_e+0.5}\big)

\end{equation}

### Notations

*tf* is the within-document frequency of t in d.

*avg_l* is the average document length in the collection.

*l* is the document length of d, which is the number of tokens in d.

*N* is the number of document in the whole collection.

*F* is the term frequency of t in the whole collection.

*n _{t}* is the document frequency of t.

*tfn* is the normalised term frequency. It is given by the normalisation 2:

The latex source code of the normalisation 2 is as follows:

\begin{equation}\label{eNormalisation2}

tfn=tf\cdot\log_2(1+c\cdot\frac{avg\_l}{l})

\end{equation}

where c is a free parameter.

*tfn _{e}* is also the normalised term frequency. It is given by a modified version of the normalisation 2:

The latex source code of the modified normalisation 2 is as follows:

\begin{equation}\label{eNormalisation2e}

tfn_e=tf\cdot\log_e(1+c\cdot\frac{avg\_l}{l})

\end{equation}

where c is a free parameter.

*λ* is the variance and mean of a Poisson distribution. It is given by F/N and F is much smaller than N.

*n _{e}* is given by N(1-(1-n

_{t}/N)

^{F}).

The relation *f* is given by the *Stirling formula*:

f(n,m)=(m+0.5)log_{2}(n/m)+(n-m)log_{2}n