CORE TODO List
1.0
Querying API doesn't support term expansion by the term pipeline, ie 1 term morphing into more than 1 term (example abbreviation expander)
Open With dialog for Desktop Terrier on Mac OS X
Faster PDF parsing, perhaps calling pdf2text if available.
Postscript parsing (via PDF?)
1.1 +
Ant task for compiling Terrier
Alternatively, provisions for compiling Terrier on Windows.
Terrier's own Exceptions for setup, indexing, querying
Integration with log4j possibly
B-tree based lexicon file format
More provisions for multiple languages (Unicode, encodings, stemmers)
Move all Binary Trees to Threaded Trees, which would allow non-recursive traversals, thus preventing stack overflows, particularly block indexing documents that have large numbers of repeated terms (eg spreadsheets)
Refinement of desktop search application with improved interface and parsers for more types of documents, as well as better integration with common Operating Systems
Refinement of the query language in order to remove the ambiguity warnings when generating the parser with ANTLR
1.2+
Thread safety
FUTURE IDEAS
See Also: Terrier/SuggestedAPIChanges
PreIndexing phase support
Allow phases prior to and after indexing to occur in Java, rather than Perl/Bash scripts. Example - Adding Anchor texts to collection: preIndexing phase
extract links
scans collection
builds up URL=>DOCID
URL=>[Anchors]
add links
Transform URL=>[Anchors] into DOCID=>[Anchors]
alters/builds new collection
Other post-indexing phases may also exist, so perhaps we should be generalising this and providing a Runnable like interface. eg run(ApplicationSetup, Collection) and run(DirectIndex, Lexicon, InvertedIndex)
Generic File opening/saving API
Suggested by: Craig Macdonald
I'm fed up writing the same code over and over again for opening gzipped text files.
Build straight to InvertedFile
Suggested by: VassilisPlachouras
Index straight to inverted index file, avoiding direct index creationg
Removes need for termids mapping to be kept during index creation
CategoryTerrier CategoryTerrier