Labrador is a distributed web Crawler (or spider), written in Perl. It has been locally designed and implemented to be integrated with the Terrier information retrieval platform.
URL:
http://www.dcs.gla.ac.uk/~craigm/labrador/
TODO
Throttle on IP instead of Hostname
Throttle on DNS domain (see RegistarBoudaries.pm from SpamAssassin 3.0.x)
Updating crawls
Save direct to TREC formatted directory structures and docids