These
datasets are distributed by the University of Glasgow to support research on
information retrieval and related technologies. All collections were or are
being used by several tracks of the TREC
conference.
Medium of Distribution
We
use 3.5" SATA hard disk drives for distributing collections too large for
DVD or CD-ROM: GOV2, Blogs06 and Blogs08 will be available ONLY on this
medium. You will need to specify either Linux or Windows file
systems. If you receive a collection on a hard drive, you will need to
install it in your system.
If
you don't have spare slots in your machine, consider using an external hard
disk enclosure with USB2.0 or FireWire interface. They're available quite
cheaply and we use one for writing the disks. The hard drive is yours to keep.
Available
Collections
The Web/Blog research collections are
distributed by the
Web
Collections:
|
Collection |
Size |
Fee
|
Sample Documents |
|
.GOV2 |
426
GB |
£600 |
- |
|
.GOV |
18 GB |
£400 |
|
|
WT10g |
10 GB |
£400 |
|
|
WT2g |
2 GB |
£250 |
Blog
Collections:
|
Collection |
Size |
Fee
|
Sample Documents |
|
Blogs06 |
148
GB |
£400 |
- |
|
Blogs08 |
2.25
TB |
£500 |
- |
Notes:
Obtaining
Test Collections
To obtain a test collection,
please follow the steps described below.
Information
on Agreements (IMPORTANT- PLEASE READ):
Please note
that the agreements are normally signed for one research group or a small unit
within a legal entity, and not for the whole entity. The licensed group is
usually a small and homogeneous group of researchers working together on the
same topic and within the same location.
For example,
the license could be for the Information Retrieval research group of the
Department of Computer Science of the University X. In this case, the “Organisation” on
the license is the Information Retrieval group of the Department of Computer
Science, while the “Corporation/Legal Entity” is the University X. The
Machine Learning research group of the same Department will need to buy another
license.
Steps to obtain the
collections:
The
organisational agreement must be signed by a person
with authority to do so on behalf of your organisation.
This person should appose his/her
initials on each page of the agreement
(See the “Initials” field at the
bottom right corner of each page).
Notes:
1.
We cannot
ship the collections until we have received both your signed organisation agreement in good order (e.g. see Information
on Agreements above) and we have cleared your payment. Note that credit card payments are significantly quicker to clear than
bank transfers.
2.
Regarding
the amount (see table above) and
method of payment, the preferred method of payment is by Credit Card (Visa and
MasterCard). Payment by transfer to University of Glasgow's bank account or
payment by cheque is also possible.
3.
Please note
that a 2% charge will be made for handling credit card payments.
4.
Please also
add 30 GBP administrative fees for
every bank transfer transaction.
If you are in a hurry:
Please
ensure that you complete all of the above steps as early as possible! The
most common causes of delayed shipment are:
We
can usually process and ship standard requests within a day or two of clearing
your payment. However, while we make every effort to ship data quickly, note that
i) distributing data is not our only job, and ii)
other groups may be ahead of you in the data distribution queue.
Requests
that do not comply with the above guidelines will not be processed.
What
Happens Next?
Please note that the current fees are at
their lowest possible values. We regret to inform you that we are unable to
offer any reduction on the fees, even for those organisations
based in developing countries.
TREC | TREC
Web track | TREC Blog
Track | GLA | NIST