UK-2006
UK-2006 is a corpus of 90M documents crawled from the .uk domain in 2006. For more information on the corpus, see
http://barcelona.research.yahoo.net/webspam/datasets/uk2006/
To index UK-2006, we use the following properties:
indexer.meta.forward.keys=uuid,url indexer.meta.forward.keylens=37,256 indexer.meta.reverse.keys=uuid trec.collection.class=WARC09Collection