Terrier/UK-2006

UK-2006

UK-2006 is a corpus of 90M documents crawled from the .uk domain in 2006. For more information on the corpus, see [WWW] http://barcelona.research.yahoo.net/webspam/datasets/uk2006/

To index UK-2006, we use the following properties:

indexer.meta.forward.keys=uuid,url
indexer.meta.forward.keylens=37,256
indexer.meta.reverse.keys=uuid
trec.collection.class=WARC09Collection

last edited 2010-03-04 17:28:02 by CraigMacdonald