The University of Glasgow took over the
distribution of the WT2g/WT10g/.GOV/.GOV2 Web Research
Collections from CSIRO (Commonwealth Scientific and Industrial
Research Organisation), which has
been distributing the Web Research collections to organizations and individuals
engaged in research and development of natural language processing, information
retrieval or document understanding systems, strictly for research purposes
only. These collections have been used in the TREC
Web & Terabyte tracks.
In addition, as part of
the TREC Blog track, the
University of Glasgow is currently distributing the Blogs06 & Blogs08 test
collections.
If you are experimenting
with Information Retrieval systems in a Web/Blogs context and/or if you are
interested in large-scale information retrieval systems design and evaluation,
then these collections are very useful. Since queries and relevance assessments
are available from the TREC Web page for these collections, you can use these
to tune/evaluate your system or approach.
Current information:
1. Getting
access to the test collections (including .GOV, .GOV2,
Blogs06, and Blogs08)
2. Specific
Information about the .GOV test collection.
3. Specific Information
about the .GOV2 test collection.
4. Specific
Information about the Blogs06 test collection
5. Specific
Information about the Blogs08 test collection
Notes:
· Topics (queries) and relevance judgments
for all collections can be found on the TREC
website
· Updated inlinks.gz
for WT10g.
· Updated inlinks.gz
for WT2g.
Last Updated: 8th February
2011
Email:test_collections
(AT) dcs (DOT) gla (DOT) ac (DOT) uk