NPL collection
The NPL (also known as the VASWANI) collection is a collection of around 10,000
document titles. It has a bit of a reputation of messing up people's
experiments.
The bits
- doc-text - The documents in text form
- query-text - The queries in text form
- doc-vecs - The documents represented by term IDs
- query-vecs - The queries represented by term IDs
- term-vocab - Table of terms with corresponding IDs
- rlv-ass - Relevance assesments
- term-vecs - A file of numbers, don't know what they do
- term-vocab - Another file of numbers
- npl.tar.gz - All the bits put together