WebIR is the name for the general research field of applying InformationRetrieval techniques to searching on the WorldWideWeb. This may involve traditional techniques, such as QueryExpansion and StatisticalModelling, as well as examining the structure and meta-data of the documents, or analysing the hyperlinks between the documents.
TREC Test Web Collections - WT2G, WT10G, DOGTOV, and DOTGOV2 collections. If you're experimenting with InformationRetrieval systems in a Web context and/or if you are interested in large-scale IR evaluation, then these crawls are really a necessity. As queries and relevance assessments are available from TREC for these collections, you can use these to tune/evaluate your system or approach.