# FormulasOfDFRModels

In all Terrier's implemented DivergenceFromRandomness (DFR) models, the relevance score of a document d for a query Q is given by

The latex source code of the above formula is as follows:

\begin{equation}\label{eDFRFramework}

score(d, Q)=\sum_{t\in Q}qtw\cdot w(t,d)

\end{equation}

where t is a query term in Q. qtw is the query term weight that is given by qtf/qtfmax. qtf is the query term frequency. qtfmax is the maximum qtf among all the query terms. w(t,d) is the weight of document d for a query term t. It is given by the DFR models described below.

### DLH

Latex source code of the DLH HypergeometricModel is as follows:

\begin{equation}\label{eDLH}

\frac{1}{tf+0.5}\cdot\bigg(\log_2(\frac{tf\cdot avg\_l}{l}\cdot\frac{N}{F})+(l-tf)\log_2(1-f)+0.5\log_2\big(2\pi tf(1-f)\big)\bigg)

\end{equation}

### BB2

Latex source code of the BB2 model is as follows:

\begin{equation}\label{eBB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(-\log_2(N-1)-\log_2(e)+f(N+F-1,N+F-tfn-2)-f(F,F-tfn)\big)

\end{equation}

### PL2

Latex source code of the PL2 model is as follows:

\begin{equation}\label{e}

\frac{1}{tfn+1}\big(tfn\cdot\log_2\frac{tfn}{\lambda}+(\lambda－tfn)\cdot\log_2e+0.5\cdot\log_2(2\pi\cdot tfn)\big)

\end{equation}

### I(n)L2

Latex source code of the I(n)L2 model is as follows:

\begin{equation}\label{eInL2}

\frac{1}{tfn+1}\big(tfn\cdot\log_2\frac{N+1}{n_t+0.5}\big)

\end{equation}

### I(F)B2

Latex source code of the I(F)B2 model is as follows:

\begin{equation}\label{eIFB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(tfn\cdot\log_2\frac{N+1}{F+0.5}\big)

\end{equation}

### In(exp)B2

Latex source code of the In(exp)B2 model is as follows:

\begin{equation}\label{eInexpB2}

\frac{F+1}{n_t\cdot(tfn+1)}\big(tfn\cdot\log_2\frac{N+1}{n_e+0.5}\big)

\end{equation}

### In(exp)C2

Latex source code of the In(exp)C2 model is as follows:

\begin{equation}\label{e}

\frac{F+1}{n_t\cdot(tfn_e+1)}\big(tfn_e\cdot\log_2\frac{N+1}{n_e+0.5}\big)

\end{equation}

### Notations

tf is the within-document frequency of t in d.

avg_l is the average document length in the collection.

l is the document length of d, which is the number of tokens in d.

N is the number of document in the whole collection.

F is the term frequency of t in the whole collection.

nt is the document frequency of t.

tfn is the normalised term frequency. It is given by the normalisation 2:

The latex source code of the normalisation 2 is as follows:

\begin{equation}\label{eNormalisation2}

• tfn=tf\cdot\log_2(1+c\cdot\frac{avg\_l}{l})

\end{equation}

where c is a free parameter.

tfne is also the normalised term frequency. It is given by a modified version of the normalisation 2:

The latex source code of the modified normalisation 2 is as follows:

\begin{equation}\label{eNormalisation2e}

• tfn_e=tf\cdot\log_e(1+c\cdot\frac{avg\_l}{l})

\end{equation}

where c is a free parameter.

λ is the variance and mean of a Poisson distribution. It is given by F/N and F is much smaller than N.

ne is given by N(1-(1-nt/N)F).

The relation f is given by the Stirling formula:

f(n,m)=(m+0.5)log2(n/m)+(n-m)log2n

last edited 2005-05-19 09:28:33 by BenHe