TREC Blog Track: Crawling Current Status
We are currently fetching blogs and news feeds on the WWW to facilitate the TREC Blog Track. Our fetch rate is once every 2 seconds per crawler per unique host IP Address (1 second for very large blog hosts). The '/robots.txt' file is collected once every 25 days. Feeds are polled at least once every week.
We will are collecting:
Feeds (RSS or Atom), including comment feeds
Blog homepages
Blog permalinks
If you have any question or concerns about this activity, please feel free to email me craigm{@at@}