I'll begin with the Mofat research paper. The indea is to include an internal index in each inverted list, and also use the Golomb compression.

I'll procede by steps for this paper. This paper recommands two types of compression : the gamma and the Golomb one.

Step 1 : I I want to use the gamma one, already coded in terrier. I'll change the data in the inverted lists, modifying writing and reading of these data. I'll one maodify 3 files for this : invertedIndex.java, invertedIndexBuilder.java and structureMerger.java.

Step 2 : I want to implement the Golomb compression and use it. I'll need to modify the same files and also bitFile.java

Step 3 : The paper deal with cosine measure, so i'll probably implement it for matching

For the other paper, I'll need to create an other class, which is in charge of electing docs in order to do not the complete computation of all the collection.

last edited 2008-04-10 08:43:14 by CraigMacdonald