Crawler

Alternative names: Spider, Robot

DESCRIPTION

A crawler is primarily used in WebIR for retrieving documents from the Internet (primarily the WorldWideWeb) and saving to a collection, ready for an IR system to index.

HOW IT WORKS

Crawlers download web pages from the Internet, and extract the links from HTML, and queue these found URLS to be fetched (onto the URLFrontier).

ISSUES

EXAMPLES OF CRAWLERS

FUNDAMENTAL ARCHITECURES

Data Structures:

FURTHER INFORMATION

Papers

Websites

Books

Recent Research

last edited 2005-03-17 19:35:18 by IadhOunis