Diff for "Labrador"

Differences between revisions 1 and 2

Deletions are marked like this. Additions are marked like this.
Line 4: Line 4:

== TODO ==
 * Throttle on IP instead of Hostname
 * Throttle on DNS domain (see RegistarBoudaries.pm from SpamAssassin 3.0.x)
 * Updating crawls
 * Save direct to TREC formatted directory structures and docids

Labrador is a distributed web Crawler (or spider), written in Perl. It has been locally designed and implemented to be integrated with the Terrier information retrieval platform.

URL: [WWW] http://www.dcs.gla.ac.uk/~craigm/labrador/

TODO

  • Throttle on IP instead of Hostname

  • Throttle on DNS domain (see RegistarBoudaries.pm from SpamAssassin 3.0.x)

  • Updating crawls

  • Save direct to TREC formatted directory structures and docids

last edited 2005-04-23 14:31:09 by CraigMacdonald