Diff for "Terrier/QueryLanguage"

Differences between revisions 1 and 2

Deletions are marked like this. Additions are marked like this.
Line 79: Line 79:
----
CategoryTerrier

Terrier Querying Language

TODO to this page

Add

  • Phrasal querying

  • Proximity querying

The current state of play

1.0 Final

  • A query consists of many term_definitions and control_definitions.


Query Q =
   (term_definition | control_definition | phrase_definition) +
  • Each term_definition is a term prefixed by an optional boolean requirement or + or -

Term_definition =
  (+|-|)Term

* Phrase_definition

phrase_definition = " term term+ "
  • Each control_definition is a known fieldname ( prefixed by an optional boolean requirement or + or - ) and followed by a term OR a control_name followed by a control value.

Control-definition:
  {+|-|}field_name:term |
  control_name:control_value
  • Each term is a sequence of alphanumeric characters, followed by an optional boost value

Term =
  ([A-Za-z0-9]+)(^[0-9]+)?
  • Each control name is a sequence of characters

Control_name =
  ([A-Za-z0-9]+)
  • Each control value is anything except a space

Control_value = 
  (.+)

Query Language Semantics

  • If a query term does not occur in combination with any phrase, field, or requirement constructs, then it is not required to appear in the retrieved documents. On the other hand if a query term appears in a phrase, field, or requirement, then it is required to exist in the retrieved documents.

  • The query: t1 t2 t3 title:t4 should be interpreted as: (t1 OR t2 OR t3) AND title:t4. In other words, the query should retrieve documents that contain term t4 in the title and at least one of the terms t1, t2, t3.

  • The query: t1 t2 +t3 +field:t4 -field:t3 should be interpreted as: (t1 OR t2) AND t3 AND field:t4 AND (t3 does not appear in field). This last query should retrieve documents in which term t4 appears in the field, term t3 does not appear in the field and at least one of the terms t1, t2 appears.

  • The query: t1 t2 field:(t2 t3) should be interpeted as: (t1 OR t2) AND field:t2 AND field:t3 = (t1 AND field:t2 AND field:t3) OR (field:t2 AND field:t3). In this case the requirement that term t2 should appear in the field overrides the unqualified occurrence of term t2 in the query.

Terrier 1.1+

  • Arbitary boolean expressions

Could have

  • Wildcards, fuzzy etc


CategoryTerrier

last edited 2005-01-19 17:45:55 by CraigMacdonald