Access to Web/Blog Research Collections



These datasets are distributed by the University of Glasgow to support research on information retrieval and related technologies. All collections were or are being used by several tracks of the TREC conference.


Medium of Distribution


We use 2.5" or 3.5" SATA hard disk drives for distributing collections too large for DVD or CD-ROM: GOV2, Blogs06 and Blogs08 will be available ONLY on this medium.  You will need to specify either Linux or Windows file systems.  If you receive a collection on a hard drive, you will need to install it in your system. 


If you don't have spare slots in your machine, consider using an external hard disk enclosure with USB2.0 or FireWire interface.  They're available quite cheaply and we use one for writing the disks. The hard drive is yours to keep.


Available Collections


The Web/Blog research collections are distributed by the University of Glasgow for research purposes only. In order to receive copies of one or more of these collections, you must sign an agreement with the University of Glasgow and pay a contribution to the University's various costs in preparing, maintaining and distributing the data.


Web Collections:




Sample Documents


426 GB




18 GB




10 GB




2 GB



Blog Collections:




Sample Documents


148 GB




2.25 TB





Obtaining Test Collections


To obtain a test collection, please follow the steps described below.


Information on Agreements (IMPORTANT- PLEASE READ):


Please note that the organiation agreements are normally signed for one research group or a small unit within a legal entity, and not for the whole entity. The licensed group is usually a small and homogeneous group of researchers working together on the same topic and within the same location.


For example, the license could be for the Information Retrieval research group of the Department of Computer Science of the University X. In this case, the “Organisation” on the license is the Information Retrieval group of the Department of Computer Science, while the “Corporation/Legal Entity” is the University X. The Machine Learning research group of the same Department will need to buy another license. 


Steps to obtain the collections:


  1. Print, complete and sign the relevant organisational and individual agreements.


The organisational agreement must be signed by a person with authority to do so on behalf of your organisation. This person should appose his/her initials on each page of the agreement (See the “Initials” field at the bottom right corner of each page). 



  1. Complete, print and sign the following requisition form.



  1. Email both the organisational agreement (ALL FOUR PAGES) and the requisition form, to, specifying:



  1. A separate individual agreement must be completed and signed by each person within your organisation who is given access to the data.  You must retain these signed individual agreements within your organisation. DO NOT send us individual agreements.




1.    We cannot ship the collections until we have received both your signed organisation agreement in good order (e.g. see Information on Agreements above) and we have cleared your payment.

2.    Payment is made by electronic transfer to University of Glasgow's bank account, or payment by cheque is also possible.


If you are in a hurry: 


Please ensure that you complete all of the above steps as early as possible!  The most common causes of delayed shipment are: 


We can usually process and ship standard requests within a day or two of clearing your payment. However, while we make every effort to ship data quickly, note that i) distributing data is not our only job, and ii) other groups may be ahead of you in the data distribution queue.


Requests that do not comply with the above guidelines will not be processed.


What Happens Next?

  1. You will receive an email confirmation of your order. This will include details of how to make your payment if you have chosen to pay by bank transfer.
  2. An invoice/receipt will be sent to you at the address specified. If you are in a hurry for the data, you may pay before receiving our official invoice. Note: In our invoicing system, amounts appear in United Kingdom pounds. For example, if your invoice says £350 it is 350 United Kingdom pounds.
  3. The data will not be shipped until we have received full payment, AND we have received your organisation's signed agreement in good order (see Information on Agreements above), AND we have received an appropriate shipping address.
  4. Data will be shipped using express post  ( Royal Mail 1st class and signed post). If you prefer another more expensive mean of shipping (e.g. DHL/UPS), then this can be accommodated for, provided that the requester covers the corresponding costs.


Please note that the current fees are at their lowest possible values. We regret to inform you that we are unable to offer any reduction on the fees, even for those organisations based in developing countries.

TREC | TREC Web track | TREC Blog Track | GLA | NIST