WT10G is a general Web crawl, used by the TREC Web track 2000-2001. It can be obtained from the University of Glasgow. The topics and qrels are available from the TREC website:
Indexing the WT10G collection is easy with Terrier. No terrier.properties are required to be altered from the default created by trec_setup.