I'll begin with the Moffat research paper. The idea is to include an internal index in each inverted list, and also use the Golomb compression.
I'll procede by steps for this paper. This paper recommends two types of compression : the gamma and the Golomb one.
Step 1 : I I want to use the gamma one, already coded in terrier. I'll change the data in the inverted lists, modifying writing and reading of these data. I'll modify 3 files for this : invertedIndex.java, invertedIndexBuilder.java and structureMerger.java.
Discussed today, please update.
Step 2 : I want to implement the Golomb compression and use it. I'll need to modify the same files and also bitFile.java
Terrier 2 contains an implementation of Golomb coding. However, it might add a nice dimension to your disseration discussing which works best.
Step 3 : The paper deal with cosine measure, so i'll probably implement it for matching
Cosine is an old fashioned ranking function, it's more interesting to apply the techniques to DFR weighting models such as PL2.
For the other paper, I'll need to create an other class, which is in charge of electing docs in order to do not the complete computation of all the collection.
This is a Matching sublass, which uses a PointeredInvertedIndex.