I'll begin with the Moffat research paper. The idea is to include an internal index in each inverted list, and also use the Golomb compression.
I'll procede by steps for this paper. This paper recommends two types of compression : the gamma and the Golomb one.
Step 1 : I I want to use the gamma one, already coded in terrier. I'll change the data in the inverted lists, modifying writing and reading of these data. I've to write this classes :
SkipListInvertedIndexOutputStream : write the compression and skiplists structure defined in the Moffat paper.
[CM]: I dont think this will extend InvertedIndex
interface PostingsStream : interface of skipping into the postings of the inverted index
Step 2 : I want to implement the Golomb compression and use it. I'll need to modify the same files and also bitFile.java
[CM]: Terrier 2 contains an implementation of Golomb coding. However, it might add a nice dimension to your disseration discussing which works best.
Step 3 : The paper deal with cosine measure, so i'll probably implement it for matching
[CM]: Cosine is an old fashioned ranking function, it's more interesting to apply the techniques to DFR weighting models such as PL2.
For the other paper, I'll need to create an other class, which is in charge of electing docs in order to do not the complete computation of all the collection.
[CM]: This is a Matching sublass, which uses a PointeredInvertedIndex.
[CM]: There are several other papers you'll need to investigate, critique and possible implement such that they can be implemented.