Information retrieval, logic and uncertainty

## Workshop on the treatment of Uncertainty in Logic-based Models
of Information Retrieval Systems

**16th September 1995.
**

Hosted by the IR Group at the Department of Computing Science, Glasgow, Scotland

Organised and chaired by Mounia Lalmas
The second workshop in this series is also available.

** A qualitative ranking method for logical information retrieval models**

Theo Huibers* and
Nathalie Denos**

*Utrech University, Netherlands

**LGI-IMAG(CLIPS), Universite Joseph Fourier, France

**Abstract**

In information retrieval the ranking of documents is normally regarded as a necessary requirement. In logical models for retrieval the relevance of a document d to a given query q depends on the validity of a formula ``d about q'' with this aboutness defined in terms of a logic. This aboutness decision does not have degrees of relevance and therefore, van Rijsbergen proposed a ranking for these models, based on an estimation of the probability that a document is about a given query. Typically, logical information retrieval models are created by proposing a model which cannot handle uncertainty and afterwards the model is extended with a probabilistic approach to handle uncertainty. These probabilistic
approaches are using quantitative amounts of information, i.e. they are dealing with numbers. In this article we argue that the van Rijsbergen computation of the probability ``d -> q'' is not a case of statistical information, but of preferences and backgrounds. We propose a ranking of the documents based on a ranked list of postulates.

**An object-orientated probabilistic logic for information retrieval**

Thomas Rölleke

Universität Dortmund, Informatik VI, Germany

**Abstract**

This paper presents an information retrieval (IR) model based upon methods of object-orientated data modelling, probability theory and logic. The work is motivated by the growing need for integrating database (DB) systems and information retrieval systems. In particular requirements of IR in multimedia databases are considered. The proposed model supports retrieval among several databases and allows for representing indexing knowledge, domain knowledge, and the knowledge of documents itself within the same framework

**Probability, information and information retrieval**

Gianni Amati* and
C J van Rijsbergen**

*Fondazione Ugo Bordoni, Italy

**Glasgow University, Scotland
**Abstract**

The paper introduces a general probabilistic framework for defining models of Information Retrieval. We present a foundational theory which combine statistical and subjective points of view of probability in the context of Information Retrieval. We use Carnap's notion of semantic information, subjective probabilities and utility theory. The framework takes also into account of the statistical information given by the data. In particular, the vector space model and the probabilistic model are shown to be 'dual'. Two extensions of these models are introduced. Finally we show how to apply this theory to Information Filtering.

** Reasoning about keywords using default logic**

Tony Hunter

Imperial College, London, England

**Abstract**

The aim of information retrieval is to provide a user with the ``best possible'' information from a database. The problem of information retrieval is determining what constitutes the best possible information for a given user. A common form of interaction for information retrieval is for the user to offer a set of keywords. These are then used by the information retrieval system to identify information that meets the users needs. For example , in a bibliographic database, a user might be interested in finding papers on some topic. The keywords would be an attempt to delineate that topic, and so used to improve precision (ensuring that a significant proportion of the items retrieved are relevant to the user) and recall (ensuring that a significant proportion of the relevant items are retrieved).
To support this, we need to reason about keywords, to identify for example, meanings, synonyms, related terms, to resolve ambiguities, and more generally, to make the process more knowledge-based. Here, we consider using default logic for reasoning with syntactic, semantic and statistical information about keywords.

** Issues on the implementation of general imaging on top of probabilistic datalog**

Thomas Rölleke*and Fabio Crestani**

*Universität Dortmund, Informatik VI, Germany

**Glasgow University, Scotland and Padua University, Italy

** Abstract **

In 1986 Van Rijsbergen proposed to interpret Information Retrieval as the process of selecting, for a query q, those documents di for which the logical formula di -> q (di implies q) holds. The desired ranking of the documents according to their relevance to the query is obtained by computing the probabilities P(di -> q). Logical Imaging provides a way of evaluating the probability of a logical implication in the context of the Possible World semantics. In 1993 a model of IR based on Logical Imaging was proposed by Crestani and Van Rijsbergen. Later they proposed an extension of this model using a generalised form of Logical Imaging.
Probabilistic Datalog, a probabilistic extension of the Datalog logical model of
databases proposed by Fuhr in 1994, enables the modelling of Information Retrieval as uncertain inference. The expressiveness of Probabilistic Datalog is such that it enables modelling both new models of hypermedia retrieval and classical probabilistic models of Information Retrieval.
In this paper we report on some results and some open issues regarding the implementation of the General Imaging model on top of Probabilistic Datalog. We intend to show that this combination not only gives the Imaging model a powerful tool for implementation, but it also enlarges the potentialities of the Imaging model itself.

**Using conditional logic for a more complete modelling of information retrieval**

Jian-Yun Nie

University of Montreal, Canada

**Abstract**

The existing IR models often consider the relevance relationship between a document and a query in isolation. In fact, this relationship is affected by a number of other factors such as the user's knowledge and intention. An adequate modelling of IR should take all these factors into account. Traditional formal tools are insufficient for this task. What formal framework then should we take for the modelling? In this talk, I will try to show that conditional logic is an adequate candidate. I will first make an informal comparison between conditional logic and relevance. Then I will describe a particular conditional logic which has
interesting features.

**From a qualitative towards a quantitative representation of uncertainty in a situation theory based model of an information retrieval system**

Mounia Lalmas

Glasgow University, Scotland
**Abstract**

In [LvR93, Lal95], two new models of an information retrieval (IR) system are proposed which aim to represent the flow of information [Dre81] in IR. The two models cater respectively for an unstructured representation and a structured representation of a documentUs information content. It was found that the components of an IR model that capture the flow of information can be classified as qualitative or quantitative. The representations of the qualitative components of the two models are based on Situation Theory [Bar89, Dev91]. The quantitative components are represented, in the unstructured case, by a general uncertainty mechanism and, in the structured case, by Dempster-Shafer's Theory of Evidence [Sha76]. Some of these components involve uncertainty. This talk will concentrate on them.
In these models the representation of the flow of information is defined by a set of constraints (a qualitative component) which, in Situation Theory, model semantic relationships between information items. These constraints can be either unconditional (they always hold) or conditional (they may not hold). An uncertainty function (a quantitative component) is attached to every conditional constraint. This function measures the uncertainty attached to the constraint (to what extent the constraint holds). Both the constraints and the uncertainty attached to the conditional constraints offer some representation of uncertainty: qualitative and quantitative, respectively. The appropriate capturing of these relationships is crucial to the implementation of the models. There are systems available from which appropriate semantic relationships can be extracted: namely thesauri.
The semantic relationships stored in most of these thesauri are only related to terms (i.e., single words or groupings of words). However, the use of existing thesauri is preferable to the onerous task of determining each semantic relationship. Two types of thesauri exist. First, we have those which define two terms to be semantically related if they often co-exist within some predefined boundaries (e.g., text, paragraph, or sentence). These types of thesauri have been proven to be non satisfactory in IR because often they do not capture correct semantic relationships. The second type of thesauri define terms that are indeed semantically related (e.g., synonyms, broader terms, narrower terms or related terms). These thesauri are usually manually-built. The benefit is that very few false relationships are produced since the process is done by human experts.
In this implementation, an on-line manually-built thesaurus is used to derive the constraints. The thesaurus is known as WordNet. In this way, correct relationships are provided; the remaining tasks are to select those which form constraints, and to associate uncertainty to the obtained constraints. A very simple determination of those uncertainty values is adopted based on the fact that WordNet represent polysemy information. The purpose of this talk is to describe the implementation of these constraints and the uncertainty function associated with the conditional constraints. Other components related to the treatment of uncertainty will also be discussed. The different experiments carried out will be described, and maybe some results will be given.
**References**

[Bar89] J. Barwise, *The Situation in Logic*, CSLI Lecture Notes 17, Stanford, California (1989).

[Dev91] K.J. Devlin, *Logic and Information.*, Cambridge University Press,
Cambridge, England (1991).
[Dre81] F. Dretske, *Knowledge and flow of Information*, Bradford Books, MIT Press, Cambridge, Massachusetts (1981).

[Lal95] M. Lalmas, *Theories of Information and Uncertainty for the modelling of Information Retrieval: an application of Situation Theory and Dempster-Shafer's Theory of Evidence.* PhD Thesis, University of Glasgow (In progress).

[LvR93] M. Lalmas and C.J. van Rijsbergen, *A model of an information retrieva l system based on situation theory and Dempster-Shafer theory of evidence,*
In V.S. Alagar, S. Berger, and F. Dong (eds) Incompleteness and Uncertainty in
Information Systems, Concordia University, Montreal (1993).

[Sha76] G. Shafer, *A Mathematical Theory of Evidence.* Princeton University Press (1976).

Also published as Technical Report no. TR-1995-18, Department of Computing Science, University of Glasgow, Glasgow, Scotland. Email reports@dcs.gla.ac.uk for a nice, bound, paper copy.

Ian Ruthven