TREC Disk 4 & 5
TREC Disks 4 & 5 are the main adhoc TREC test collections that followed Disks 1 & 2.
Indexing Disks 4 & 5 is easy with Terrier. Only one property in terrier.properties needs to be altered from the default created by trec_setup, as follows:
#skip indexing some tags for these corpora TrecDocTags.process=TEXT,H3,DOCTITLE,HEADLINE,TTL
When indexing, we do not typically include the Congressional Record when indexing. See Query performance prediction, B.H He & I.Ounis, Information Systems 31(7), pp585--594, 2006. http://portal.acm.org/citation.cfm?id=1226381