Terrier Indexing architecture rough diagram:
These are the current implementation as they stand as of Terrier 1.0.0.
Collection: represents a collection.
methods: boolean endOfCollection(); boolean nextDocument(); Document getDocument; String docid(); void reset();
Document: represents a document of a collection
Indexer: uses a Collection to get each Document, extracts Terms, Stops and Stems them
methods: void buildDirect(); void buildInverted();
Uses a list of named Fields from properties file, to note which fields a term belongs to. This named list can then be used at query time to check to see if the required term exists in that field. Example: Query(intitle:"index of" mp3)
Passes terms to a TermPipeline which would peform stopping and stemming (and other options, eg translation)
Finally, invoked the InvertedIndexBuilder
I think I need to implement my own Collection/Document/Indexer
You probably want to read Terrier/XMLCollections