Terrier/WhatsNew

What's New?

Terrier 1.0

New Querying APIs

A new querying API has been implemented to allow Terrier to be suited for more applications, including interactive applications. To this end, we have encapsulated every query in a SearchRequest object, which is passed through different stages of a query retrieval by the Manager:

A query has to be parsed into a syntax tree - this allows Terrier to identify terms, phrases, requirements, fields, proximity requirements, weights etc from the grammar of the query entered. For this we use a parser generated by the Antlr parser generator.

The Query tree is then traversed. This allows three operations: each term to be passed through the TermPipeline (stemming, stopping etc); controls to be identified and removed; terms to be aggregated for the Matching process

The aggregated terms (known as MatchingTerms) are the query for the main retrieval (Matching) stage, where relevant documents are determined, and scores assigned using assigned weighting model. There are two additional (new) substages at this time:

Post Processing is for application specific code to alter the result set in an unspecified way. Terrier provides automatic QueryExpansion where relevant terms from the top N documents are added to the query, and the matching stage rerun.

Post Filtering is like Post Processing, but only one document of the result set may be operated on at any one time - this allows results to be filtered out (eg not in a specific DNS domain for Search Engine results)

The querying stage of Terrier is controlled by controls which are string->string mappings. These can either be set in two places:

querying.default.controls=c:1.0,start:0,end:999
querying.allowed.controls=c,range,scope,qe,qemodel,start,end

I have documented the controls present in the Terrier 1.0 core separately : Terrier/QueryingControls

Controls are often used to turn on post processes or post filters. However, controls need to be mapped into class names, using the querying.postprocesses.controls and querying.postfilters.controls properties in the terrier.properties file. In addition, as order is often important, you should specify the order using the querying.postprocesses.order and querying.postfilters.order properties.

querying.postprocesses.order=QueryExpansion
querying.postprocesses.controls=qe:QueryExpansion

querying.postfilters.order=Scope
querying.postfilters.controls=scope cope

New Indexing API

Terrier 1.0 also has a new indexing API which allows more diverse collections of documents to be indexed. To this end, Terrier breaks the indexing up into several responsibilties:


CategoryTerrier

last edited 2005-01-19 17:50:33 by CraigMacdonald