Terrier/Datasets/NewsVerticalSearch

SIGIR 2013 : News Vertical Search Dataset

This is a dataset containing the news-related queries and ranked documents for those queries from multiple news and user-generated sources during the period of the 10th to the 16th of January 2013. It was designed to be used when investigating real-time news vertical search in the paper 'News Vertical Search: When and What to Display to Users', published at SIGIR 2013 and accessible from:

Contents

The dataset is comprised of 3,446 files, each representing a single query topic made at a specific point in time. Each file follows a fixed naming scheme:

<query source>_<query time>_<query number>_<creation time>.topic.csv

where the <query source> is BitlyBurst, representing the Bitly bursting phrases stream, the <query time> is the timestamp when the query was identified, <query number> is the id of the query identified for that time point (queries are collected in batches) and <creation time> is the time the file was written (this is very close to the query time).

Each file is in CSV format. The first line is a comment, starting with a '#' character and provides additional information about the query. Each subsequent line lists a single document ranked for that query. The sources ranked are:

Each line of the file contains the following information

"uniqueid","source","queryid","query","querydate","score","headline","summary","docdate"  

where "uniqueid" is a unique id for the query-document pair, "source" is the <query source> as before, "queryid" is a unique query identifier, "query" is the textual query string, "querydate" is the <query time>, "score" is the document score (if the associated API provides a score), "headline" is the title of the document, "summary" contains the contents for the document and "docdate" lists the time the document was created (if available).

Terms of Use

The dataset is provided free of charge 'as is' for research purposes. By using this dataset, you agree to cite the following publication in any future works and/or publications that use this dataset:

@inproceedings{mccreadie2013sigirVerticalSearch,
 author = {McCreadie, Richard and Macdonald, Craig and Ounis, Iadh},
 title = {News Vertical Search: When and What to Display to Users},
 booktitle = {Proceedings of SIGIR'13},
 year = {2013},
 location = {Dublin, Ireland},
 }

The dataset can be downloaded here