SIGIR 2013 : News Vertical Search Dataset

This is a dataset containing the news-related queries and ranked documents for those queries from multiple news and user-generated sources during the period of the 10th to the 16th of January 2013. It was designed to be used when investigating real-time news vertical search in the paper 'News Vertical Search: When and What to Display to Users', published at SIGIR 2013 and accessible from:


The dataset is comprised of 3,446 files, each representing a single query topic made at a specific point in time. Each file follows a fixed naming scheme:

<query source>_<query time>_<query number>_<creation time>.topic.csv

where the <query source> is BitlyBurst, representing the Bitly bursting phrases stream, the <query time> is the timestamp when the query was identified, <query number> is the id of the query identified for that time point (queries are collected in batches) and <creation time> is the time the file was written (this is very close to the query time).

Each file is in CSV format. The first line is a comment, starting with a '#' character and provides additional information about the query. Each subsequent line lists a single document ranked for that query. The sources ranked are:

Each line of the file contains the following information

where "uniqueid" is a unique id for the query-document pair, "source" is the <query source> as before, "queryid" is a unique query identifier, "query" is the textual query string, "querydate" is the <query time>, "score" is the document score (if the associated API provides a score), "headline" is the title of the document, "summary" contains the contents for the document and "docdate" lists the time the document was created (if available).

=== Terms of Use ===

The dataset is provided free of charge 'as is' for research purposes. By using this dataset, you agree to cite the following publication in any future works and/or publications that use this dataset:
 author = {McCreadie, Richard and Macdonald, Craig and Ounis, Iadh},
 title = {News Vertical Search: When and What to Display to Users},
 booktitle = {Proceedings of SIGIR'13},
 year = {2013},
 location = {Dublin, Ireland},

The dataset can be downloaded here

