Diff for "Terrier/DOTGOV"

Differences between revisions 2 and 3

Deletions are marked like this. Additions are marked like this.
Line 8: Line 8:
Indexing the DOTGOV collection is easy with Terrier. No terrier.properties are required to be altered from the default created by trec_setup. Indexing the DOTGOV collection is easy with Terrier. No terrier.properties are required to be altered from the default created by trec_setup. If you wish URLs in your index, then set the following properties:
{{{
trec.collection.class=TRECWebCollection
indexer.meta.forward.keys=docno,url
indexer.meta.forward.keylens=26,256
}}}

See the Terrier documentation on [http://terrier.org/docs/current/terrier_http.html Web-based Terrier] to see how to build a Web search engine for this collection.

DOTGOV

DOTGOV is a Web crawl of the .gov US government websites, and was used by the TREC Web track 2002-2004. It can be obtained from the [WWW] University of Glasgow. The topics and qrels are available from the TREC website:

Indexing the DOTGOV collection is easy with Terrier. No terrier.properties are required to be altered from the default created by trec_setup. If you wish URLs in your index, then set the following properties:

trec.collection.class=TRECWebCollection
indexer.meta.forward.keys=docno,url
indexer.meta.forward.keylens=26,256

See the Terrier documentation on [WWW] Web-based Terrier to see how to build a Web search engine for this collection.

last edited 2011-06-14 19:47:40 by CraigMacdonald