Web Research Collections

(TREC Web,Terabyte & Blog Tracks)

TREC Web - TREC Terabyte – TREC Blogs




The University of Glasgow took over the distribution of the WT2g/WT10g/.GOV/.GOV2 Web Research Collections from  CSIRO (Commonwealth Scientific and Industrial Research Organisation), which  has been distributing the Web Research collections to organizations and individuals engaged in research and development of natural language processing, information retrieval or document understanding systems, strictly for research purposes only. These collections have been used in the TREC Web & Terabyte tracks.


In addition, as part of the TREC Blog track, the University of Glasgow is currently distributing the Blogs06 & Blogs08 test collections.


If you are experimenting with Information Retrieval systems in a Web/Blogs context and/or if you are interested in large-scale information retrieval systems design and evaluation, then these collections are very useful. Since queries and relevance assessments are available from the TREC Web page for these collections, you can use these to tune/evaluate your system or approach.


Current information:

                1.   Getting access to the test collections (including .GOV, .GOV2, Blogs06, and Blogs08)

2.    Specific Information about the .GOV test collection.

3.    Specific Information about the .GOV2 test collection.

4.    Specific Information about the Blogs06 test collection

5.    Specific Information about the Blogs08 test collection






·      Topics (queries) and relevance judgments for all collections can be found on the TREC website

·      Updated inlinks.gz for WT10g.

·      Updated inlinks.gz for WT2g.

Last Updated: 8th February 2011

Email:test_collections (AT) dcs (DOT) gla (DOT) ac (DOT) uk