TREC Blog Track 2009
As discussed in the Blog track workshop in TREC 2008, the Blog track 2009 will make use of the new Blogs08 test collection, a larger and more up-to-date sample of the blogosphere, which has a much longer time-span period than that of the Blogs06 collection.
The Blog track 2009 aims to investigate more refined and complex search scenarios in the blogosphere. In particular, following discussions at the Blog track workshop at TREC 2008, we propose to run the following tasks:
Faceted blog distillation: a more refined version of the blog distillation task that addresses the quality aspect of the retrieved blogs.
Top stories identification: A task that addresses news-related issues on the blogosphere.
A draft version of the final TREC 2009 Blog track overview is available online http://www.dcs.gla.ac.uk/~craigm/publications/blogOverview2009.pdf.
Faceted Blog Distillation Task
Blog search users often wish to identify blogs about a given topic, which they can subscribe to and read on a regular basis. This user task is most often manifested in two scenarios:
Filtering: The user subscribes to a repeating search in their RSS reader.
Distillation: The user searches for blogs with a recurring central interest, and then adds these to their RSS reader.
In the TREC Blog track, we have been investigating the latter scenario – blog distillation. The blog distillation task can be summarised as Find me a blog with a principle, recurring interest in X. For a given area X, systems should suggest feeds that are principally devoted to X over the timespan of the feed, and would be recommended to subscribe to as an interesting feed about the X (ie a user may be interested in adding it to their RSS reader).
In its TREC 2007 and TREC 2008 form, the blog distillation task only focuses on topical relevance. It does not address the quality aspect of the retrieved blogs. Following a position paper by Marti Hearst et al. in SSM 2008, we propose a refinement of the blog distillation task that takes into account a number of attributes or facets such as the authority of the blog, its opinionated nature, the trustworthiness of its authors, or the genre of the blog and its style of writing. The new faceted blog distillation task can be summarised as Find me a good blog with a principal, recurring interest in X. The task has the following characteristics:
It goes beyond topical-relevance
It integrates a quality aspect in the evaluation of the retrieved blogs
It mimics an exploratory search task
The facets will be allocated on a per-topic basis. Evaluation will be done as for the blog distillation task in 2008, with the caveat that blogs should be assessed on the facets active for a given topic.
Training can be done on the Blogs06 collection using the previous years relevance assessments, albeit without facets.
We propose several facets for the TREC 2009 blog distillation task, which may be of varying difficulty to identify for participant systems. Topics will have facets of interest attached to them, but there will be a reasonable spread between all facets in use for this year. The facets that will be considered for TREC 2009 are:
1. Opinionated: Some bloggers may make opinionated comment on the topics of interest, while others report factual information. A user may be interested in blogs which show prevalence to opinionatedness. For this facet, the values of interest are 'opinionated' vs 'factual' blogs.
2. Personal: Companies are increasingly using blogging as an activity for PR purposes. However, a user may not wish to read such mostly marketing or commercial blogs, and prefer instead to keep to blogs that appear to be written in personal time without commercial influences. For this facet, the values of interest are 'personal' vs 'official' blogs.
3. In-depth: Users might be interested to follow bloggers whose posts express in-depth thoughts and analysis on the reported issues, preferring these over bloggers who simply provide quick bites on these topics, without taking the time to analyse the implications of the provided information. For this facet, the values of interest are 'indepth' vs. 'shallow' blogs (in terms of their treatment of the subject).
For a given topic, the appropriate facet will be chosen by the TREC assessors during topic development.
In future incarnations of this task, systems may be asked to select automatically the facets they think are interesting for a given query.
For each topic, systems should supply the top 100 blogs which they think are both relevant to the topic, and which are likely to satisfy the first value (e.g. opinionated) of interest attached to the topic, followed by the second value (e.g. factual) of interest attached to the facet. In addition, for each topic, systems should provide a ranking of blogs where 'no facet value is applied' (denoted by 'none').
<top> <num>1051</num> <query>Example query</query> <facet>personal</facet> <description> longer statement of the information need </description> <narrative> description </narrative> </top>
Runs have the format detailed below. In particular, for each topic, you should produce three rankings of 100 blogs each: one for the first value of the facet enabled, one with the second value of the facet enabled, and one for a baseline ranking with no facet whatsoever enabled. For example, for the personal facet, the first ranking would have 100 blogs that your system thinks are Personal, the second ranking would have 100 blogs which your system thinks are Official, while the third ranking would have 100 blogs which your system thinks are relevant to the topic, without any consideration for the facet.
topic-facet_value1 Q0 docno rank sim runtag .... topic-facet_value2 Q0 docno rank sim runtag .... topic-facet_none Q0 docno rank sim runtag
1051-personal Q0 blog08-feed-00002 1 10 testRun 1051-personal Q0 blog08-feed-00001 2 9 testRun ... 1051-official Q0 blog08-feed-00501 1 10.1 testRun 1051-official Q0 blog08-feed-00112 2 9.2 testRun ... 1051-none Q0 blog08-feed-00001 1 20.1 testRun 1051-none Q0 blog08-feed-00041 2 17.1 testRun ... 1052...
Participating groups may submit up to 'four' runs for the faceted blog distillation task. We wholeheartedly encourage the submission of manual runs, which are invaluable in improving the quality of the collection. (An automatic run is one that involves no human interaction. In contrast, a manual run is one where (for example) you formulate queries, search manually, give relevance feedback, and/or rerank documents by hand.)
Topics development and relevance assessments for this task will be performed by NIST. We have actively pursued the option of obtaining query logs from a commercial search engine to assist the creation of realistic topics.
The following scale will be used for the assessment:
[-1] i.e. Not judged. The content of the blog was not
examined due to offensive URLs or headers (such documents do exist in the collection due to spam). Although the content itself was not assessed, it is very likely, given the offensive headers, that the blog is irrelevant.
 i.e. Not relevant. The blog and its posts were
examined, and does not contain any interest in the target topic area, or refers to it only in passing.
 i.e. Relevant but facet value unknown.
 i.e. Relevant and clearly inclined towards first facet value.
 i.e. Relevant and clearly inclined towards second facet value.
The number of test targets is 50. Metrics will be precision/recall based, where the actual "most important metric" will be MAP.
Top Stories Identification Task
The query logs from the commercial search engines show that there is a fair number of news-related queries, suggesting that Blog search users have an interest in the blogosphere response to news stories as they develop.
We propose to run a new pilot search task addressing the news dimension in the blogosphere: For a given unit of time (e.g. date), systems will be asked to identify the top news stories (similar to what is displayed on the main page of Google Blog Search or Google News), and provide a list of relevant blog posts discussing each news story. The ranked list of blog posts should have a diverse nature, covering different/diverse aspects or opinions of the news story.
Participating System: Inputs & Output
Participating groups will be provided with a large sample of news headlines and their corresponding dates from throughout the timespan of the Blogs08 corpus. Participants will also have access to the Blogs08 corpus, from which they can extract relevant date information.
In response to a date "query", systems should provide a ranking of 100 headlines that they think were important on the specified day. Moreover for each headline, they should provide a ranking of 10 blog posts which are relevant to and discuss the news story headline.
The dates of the provided headlines will be the ones used by the news broadcaster. For example, a story that happens in Europe very early in the morning of day d, can be issued with a date d-1 by an American news broadcaster. Because of this possible time disparity between the date when the headline was issued by the news broadcaster and the one where the story actually happened, the participating systems should rank all headlines corresponding to the query date d +-1 days (i.e. headlines on day d, day d-1, and day d+1).
On the other hand, note that relevant blog posts may naturally be posted on or after the date of the news headline, but even shortly before the provided headline date (recall the possible time disparity). They just have to be on topic, i.e. related to the news headline. The blog posts selected for a given headline should be diverse in that they discuss different aspects, perspectives or opinions of the news story.
Importantly, the aim of the task is to ascertain the usefulness of the blogosphere in real-time news identification. Moreover, as the headline information is available on the Web, groups should use only the data provided, and not resort to external news resources or systems to enrich their system's knowledge. When external resources - beyond the Blogs08 collection and the provided sample of headlines and their corresponding dates - are used, these should be clearly mentioned. Runs using external resources will be reported separately.
Sample news headline corpus:
BLOG08-NEWS-0000001 News headline 1 here BLOG08-NEWS-0000002 News headline 2 here ...
<top> <num>1110</num> <date>20080424</date> </top> ...
The system responses are similar to the TREC Enterprise track Expert Search task formats. It includes a list of supporting relevant discussive documents (at most 10) in the response covering various aspects of the news story.
Sample system response:
1110 Q0 BLOG08-NEWS-0000002 1 10.0 runtag SUPPORT BLOG08-20080426-000258281 1 1.5 runtag SUPPORT BLOG08-20080426-000333190 2 1.3 runtag 1110 Q0 BLOG08-NEWS-0010056 2 9.8 runtag ...
Participating groups may submit up to four runs for the top stories identification task. Each run consists of a ranking of 100 headlines, and their corresponding supporting relevant posts.
Assessors will use multiple sources of evidence to answer three questions: (i) What are the top news stories for a given day? (ii) Which blog posts are relevant to a given news story? (iii) What aspects of the news story that the blog posts discuss.
1. News Story Headline Assessment: Only headlines published on the query date d+-1 days can be judged relevant. Assessors will decide using various sources of evidence what the top stories were for a given day.
2. Blog Post Assessment: For each top new story, assessors will decide on the relevant blog posts discussing the news story.
3. Relevant Blog Post Diversity Assessment: For the relevant blog posts for a news story, assessors will group these posts into topics covering various aspects of the news story.
Number of test targets will be 50. Evaluation will use precision/recall measures based on correct story headlines, while the 'most important' metric will be MAP.
The 2nd level evaluation will examine how good each system is at identifying relevant related blog posts. In this 2nd evaluation, we will also score by MAP. However, (similar to the TREC 2009 Web track), we will also examine diversity - systems will be penalised for retrieving blog posts which do not add any information/perspectives to those already retrieved.
9th April: Blogs08 collection ready for distribution
Mid-May: Search tasks defined
6th July: Blog distillation topics available
7th July: Top Stories ID task queries available
28th August: Top Stories ID task runs due
31st August: Blog distillation task runs due
5th September: News stories ID participant assessment phase starts
30th September: News stories ID participant assessment phase ends
History of Document
March 04, 2009: first draft
April 22, 2009: draft of faceted blog distillation task guidelines added
April 28, 2009: draft of top news story identification task guidelines added
June 22, 2009: timeline updated
August 7, 2009: updated run formats.