Breadth-First is a common strategy for crawling the Internet. Crawling strategies are useful when crawling the Internet as it is too large to download all documents, and the crawler must prioritise documents. This often means the crawler has a choice of which pages to download. By crawling in breadth-first fashion, the crawler crawls each of it's rootset before continuing onto crawling all the links it found on the first page. As opposed to DepthFirstCrawling, the crawler is less likely to get engrossed in a site while other sites are uncrawled.

Najork and Wiener[1] found that crawling the Internet breadth-first yielded high-quality pages, if PageRank was used as a measure of quality.

See Also
  1. Breadth-First Crawling Yields High-Quality Pages, M. Najork and J. L. Wiener

last edited 2005-01-19 16:38:02 by CraigMacdonald