Terrier IR Platform
2.2.1

uk.ac.gla.terrier.indexing
Class BasicSinglePassIndexer

java.lang.Object
  extended by uk.ac.gla.terrier.indexing.Indexer
      extended by uk.ac.gla.terrier.indexing.BasicIndexer
          extended by uk.ac.gla.terrier.indexing.BasicSinglePassIndexer
Direct Known Subclasses:
BlockSinglePassIndexer, Hadoop_BasicSinglePassIndexer

public class BasicSinglePassIndexer
extends BasicIndexer

This class indexes a document collection (skipping the direct file construction). It implements a single-pass algorithm, that operates in two phases:
First, it traverses the document collection, passes the terms through the TermPipeline and builds an in-memory representation of the posting lists. When it has exhausted the main memory, it flushes the sorted posting to disk, along with the lexicon, and continues traversing the collection.
The second phase, merges the sorted runs (with their partial lexicons) in disk to create the final inverted file. This class follows the template pattern, so the main bulk of the code is reused for block (and fields) indexing. There are a few hook methods, that chooses the right classes to instanciate, depending on the indexing options defined.

Properties:

Version:
$Revision: 1.11 $
Author:
Roi Blanco

Constructor Summary
BasicSinglePassIndexer(java.lang.String pathname, java.lang.String prefix)
          Constructs an instance of a BasicSinglePassIndexer, using the given path name for storing the data structures.
 
Method Summary
 void createDirectIndex(Collection[] collections)
          Creates the direct index, the document index and the lexicon.
 void createInvertedIndex()
          Creates the inverted index after having created the direct index, document index and lexicon.
 void createInvertedIndex(Collection[] collections)
          Builds the inverted file and lexicon file for the given collections Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (eg stemming, stopping, lowercase).
 void performMultiWayMerge()
          Uses the merger class to perform a k multiway merge in a set of previously written runs.
 
Methods inherited from class uk.ac.gla.terrier.indexing.Indexer
index, isUTFIndexing, main, merge, merge, useFieldInformation
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

BasicSinglePassIndexer

public BasicSinglePassIndexer(java.lang.String pathname,
                              java.lang.String prefix)
Constructs an instance of a BasicSinglePassIndexer, using the given path name for storing the data structures.

Parameters:
pathname - String the path where the datastructures will be created.
Method Detail

createDirectIndex

public void createDirectIndex(Collection[] collections)
Description copied from class: BasicIndexer
Creates the direct index, the document index and the lexicon. Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (eg stemming, stopping, lowercase).

Overrides:
createDirectIndex in class BasicIndexer
Parameters:
collections - Collection[] the collections to be indexed.

createInvertedIndex

public void createInvertedIndex()
Description copied from class: BasicIndexer
Creates the inverted index after having created the direct index, document index and lexicon.

Overrides:
createInvertedIndex in class BasicIndexer

createInvertedIndex

public void createInvertedIndex(Collection[] collections)
Builds the inverted file and lexicon file for the given collections Loops through each document in each of the collections, extracting terms and pushing these through the Term Pipeline (eg stemming, stopping, lowercase).

Parameters:
collections - Collection[] the collections to be indexed.

performMultiWayMerge

public void performMultiWayMerge()
Uses the merger class to perform a k multiway merge in a set of previously written runs. The file names and the number of runs are given by the private queue


Terrier IR Platform
2.2.1

Terrier Information Retrieval Platform 2.2.1. Copyright 2004-2008 University of Glasgow