Diff for "Terrier/Disks1&2"

Differences between revisions 8 and 9

Deletions are marked like this. Additions are marked like this.
Line 19: Line 19:
On the other hand, if your copy of the collection is compressed with .Z extensions, you will need some additional configuration for Terrier (version 5.2 onwards): On the other hand, if your copy of the collection is compressed with .Z or .z extensions, you will need some additional configuration for Terrier (version 5.2 onwards) to be able to read them:
Line 23: Line 23:
files.mappings=Z:org.apache.commons.compress.compressors.z.ZCompressorInputStream:null files.mappings=Z:org.apache.commons.compress.compressors.z.ZCompressorInputStream:null,z:org.apache.commons.compress.compressors.z.ZCompressorInputStream:null

TREC Disks 1 & 2

TREC Disks 1 & 2 are the original TREC test collections.

Indexing Disks 1 & 2 is easy with Terrier. Only one property in terrier.properties needs to be altered from the default created by trec_setup, as follows:


#skip indexing some tags for these corpora
TrecDocTags.process=TEXT,TITLE,HEAD,HL

Compressed Files

If your copy of the collection is compressed with .gz extensions, then Terrier can read this fine.

On the other hand, if your copy of the collection is compressed with .Z or .z extensions, you will need some additional configuration for Terrier (version 5.2 onwards) to be able to read them:

terrier.mvn.coords=org.apache.commons:commons-compress:1.18
files.mappings=Z:org.apache.commons.compress.compressors.z.ZCompressorInputStream:null,z:org.apache.commons.compress.compressors.z.ZCompressorInputStream:null

last edited 2019-06-05 08:16:13 by CraigMacdonald