TREC Disks 1 & 2
TREC Disks 1 & 2 are the original TREC test collections.
Indexing Disks 1 & 2 is easy with Terrier. Only one property in terrier.properties needs to be altered from the default created by trec_setup, as follows:
#skip indexing some tags for these corpora TrecDocTags.process=TEXT,TITLE,HEAD,HL
If your copy of the collection is compressed with .gz extensions, then Terrier can read this fine.
On the other hand, if your copy of the collection is compressed with .Z or .z extensions, you will need some additional configuration for Terrier (version 5.2 onwards) to be able to read them: