DepthFirstCrawling

DepthFirstCrawling is little used by crawlers as it gives poor coverage in crawling situation where the crawl cannot be a "complete crawl" - ie the crawl must be terminated before all pages are downloaded. The Internet is too large to perform a complete crawl, so the pages crawled must be carefully selected.

Depth First travesal of the link tree is poor because the crawler tends to quickly become 'engrossed' deep in the 1st site instead of getting equally engrossed with each site together. This is related to the heuristic that the pages nearer the entry point of a website are likely to be more interesting.

Other crawling/traversal techniques: BreadthFirstCrawling

last edited 2005-01-19 16:24:50 by CraigMacdonald