TREC Blog Track: Crawling Current Status

We are currently fetching blogs and news feeds on the WWW to facilitate the TREC Blog Track. Our fetch rate is once every 2 seconds per crawler per unique host IP Address (1 second for very large blog hosts). The '/robots.txt' file is collected once every 25 days. Feeds are polled at least once every week.

We will are collecting:

Feeds (RSS or Atom), including comment feeds

Blog homepages

Blog permalinks

If you have any question or concerns about this activity, please feel free to email me craigm{@at@}dcs.gla.ac.NOSPAM.uk