Access to Web/Blog Research Collections

 

 

These datasets are distributed by the University of Glasgow to support research on information retrieval and related technologies. All collections were or are being used by several tracks of the TREC conference.

 


Medium of Distribution

 

We use 3.5" SATA hard disk drives for distributing collections too large for DVD or CD-ROM: GOV2, Blogs06 and Blogs08 will be available ONLY on this medium.  You will need to specify either Linux or Windows file systems.  If you receive a collection on a hard drive, you will need to install it in your system. 

 

If you don't have spare slots in your machine, consider using an external hard disk enclosure with USB2.0 or FireWire interface.  They're available quite cheaply and we use one for writing the disks. The hard drive is yours to keep.

 


Available Collections

 

The Web/Blog research collections are distributed by the University of Glasgow for research purposes only. In order to receive copies of one or more of these collections, you must sign an agreement with the University of Glasgow and pay a contribution to the University's various costs in preparing, maintaining and distributing the data.

 

Web Collections:

Collection

Size

Fee

Sample Documents

.GOV2

426 GB

£650

-

.GOV

18 GB

£500

.GOV

WT10g

10 GB

£500

wt10g

WT2g

2 GB

£350

wt2g


Blog Collections:

Collection

Size

Fee

Sample Documents

Blogs06

148 GB

£500

-

Blogs08

2.25 TB

£600

-

 

Notes:


Obtaining Test Collections

 

To obtain a test collection, please follow the steps described below.

 

Information on Agreements (IMPORTANT- PLEASE READ):

 

Please note that the organiation agreements are normally signed for one research group or a small unit within a legal entity, and not for the whole entity. The licensed group is usually a small and homogeneous group of researchers working together on the same topic and within the same location.

 

For example, the license could be for the Information Retrieval research group of the Department of Computer Science of the University X. In this case, the “Organisation” on the license is the Information Retrieval group of the Department of Computer Science, while the “Corporation/Legal Entity” is the University X. The Machine Learning research group of the same Department will need to buy another license. 

 

Steps to obtain the collections:

 

  1. Print, complete and sign the relevant organisational and individual agreements.

 

The organisational agreement must be signed by a person with authority to do so on behalf of your organisation. This person should appose his/her initials on each page of the agreement (See the “Initials” field at the bottom right corner of each page). 

 

 

  1. Complete, print and sign the following requisition form.

 

 

  1. Fax BOTH the signed & initialed organisational agreement with ALL its FOUR pages, AND the completed requisition form to: +44 141 330 4439  Att: May Gallagher

 

 

  1. After you have faxed the organisational agreement and the payment forms, please place your order by email to test_collections@dcs.gla.ac.uk specifying:

 

 

  1. A separate individual agreement must be completed and signed by each person within your organisation who is given access to the data.  You must retain these signed individual agreements within your organisation. DO NOT send us individual agreements.

 

 

Notes:

1.    We cannot ship the collections until we have received both your signed organisation agreement in good order (e.g. see Information on Agreements above) and we have cleared your payment. Note that credit card payments are significantly quicker to clear than bank transfers.

2.    Regarding the amount (see table above) and method of payment, the preferred method of payment is by Credit Card (Visa and MasterCard). Payment by transfer to University of Glasgow's bank account or payment by cheque is also possible.

3.    Please note that a 2% charge will be made for handling credit card payments.

4.    Please also add 30 GBP administrative fees for every bank transfer transaction.

 

If you are in a hurry: 

 

Please ensure that you complete all of the above steps as early as possible!  The most common causes of delayed shipment are: 

 

We can usually process and ship standard requests within a day or two of clearing your payment. However, while we make every effort to ship data quickly, note that i) distributing data is not our only job, and ii) other groups may be ahead of you in the data distribution queue.

 

Requests that do not comply with the above guidelines will not be processed.

 


What Happens Next?

  1. You will receive an email confirmation of your order. This will include details of how to make your payment if you have chosen to pay by bank transfer.
  2. An invoice/receipt will be sent to you at the address specified. If you are in a hurry for the data, you may pay before receiving our official invoice. Note: In our invoicing system, amounts appear in United Kingdom pounds. For example, if your invoice says £350 it is 350 United Kingdom pounds.
  3. The data will not be shipped until we have received full payment, AND we have received your organisation's signed agreement in good order (see Information on Agreements above), AND we have received an appropriate shipping address.
  4. Data will be shipped using express post  ( Royal Mail 1st class and signed post). If you prefer another more expensive mean of shipping (e.g. DHL/UPS), then this can be accommodated for, provided that the requester covers the corresponding costs.

 

Please note that the current fees are at their lowest possible values. We regret to inform you that we are unable to offer any reduction on the fees, even for those organisations based in developing countries.


TREC | TREC Web track | TREC Blog Track | GLA | NIST


test_collections@dcs.gla.ac.uk