TREC-BLOG/TREC2008

CONTENTS

This wiki page provides all details about the TREC Blog track campaign ran in 2008. It can be seen as the archives of the TREC 2008 Blog track. Details about the new TREC Blog track 2009 campaign can be found in TREC-BLOG

Further details about the 2008 edition can be found in the TREC 2008 Blog track 'Overview paper', which will appear in the Proceedings of TREC 2008, after it completes the WERB process. On a point of information, a updated draft of the paper is available at [WWW] http://www.dcs.gla.ac.uk/~ounis/blogOverview2008.pdf

  1. Search Tasks
    1. Baseline Adhoc Retrieval Task
    2. Opinion Retrieval Task
    3. Blog Post Tasks Assessment
    4. Blog Post Retrieval Task Submissions
    5. Blog Distillation (Feed Search) Task
  2. Track Timeline

Search Tasks

In TREC 2008, two main tasks ran: the opinion finding task and the blog distillation task. Following from our conclusions from both TREC 2006 and 2007 Blog track campaigns, we have structured the TREC Blog track in 4 search tasks:

The Blog06 test collection was used in TREC 2008. (see [WWW] http://ir.dcs.gla.ac.uk/test_collections/blog06info.html )

Baseline Adhoc Retrieval Task

The baseline adhoc retrieval task involves locating blog posts that contain relevant information about a given topic target. This task will use the same topics as the other two blog post retrieval tasks, however all opinion-finding retrieval techniques should be turned off.

This task corresponds to the topic-relevance baseline runs from the TREC 2007 overview paper.

Opinion Retrieval Task

The opinion retrieval task involves locating blog posts that express an opinion about a given target. It can be summarised as What do people think about <target>. It is a subjective task. The target can be a "traditional" named entity -- a name of a person, location, or organization -- but also a concept (such as a type of technology), a product name, or an event. Note that the topic of the post does not necessarily have to be the target, but an opinion about the target must be present in the post or one of the comments to the post.

For example, for the target "skype":

Excerpt from relevant, opinionated post (permalink [WWW] http://gigaom.com/2005/12/01/skype-20-eats-its-young/):

Excerpt from unopinionated post (permalink [WWW] http://www.slashphone.com/115/3152.html):

Evaluation will be by standard IR evaluation measures, such as MAP. For each submitted run, you should designate which topic-relevance baseline run it corresponds to. For training, participants could use last years 100 queries, with their associated relevance judgements - available from [WWW] http://trec.nist.gov/data/blog06.html.

Polarity Task

The polarity task will run again this year, but in a different form. In a closer analogue to a user task where for a given query, a system would retrieve both the positive and negative opinionated document/sentences, categorised in the user display (Opinmind.com used to do this).

For this year's polarity task, for each topic, your system should retrieve and rank all the positive opinionated documents. Then for each topic, your system should retrieve and rank all the negative opinionated documents. For submission, these two runs should be concatenated together in one run file, separated by a blank line. Note that mixed opinionated documents, i.e. documents labelled [3] in the relevance assessment procedure, should not be listed in the positive (resp. negative) rankings of retrieved documents.

Evaluation will be by standard IR evaluation measures, such as MAP. For each submitted run, you should designate which topic-relevance baseline run it corresponds to. For training, participants could use last years 100 queries, with their associated relevance judgements - available from [WWW] http://trec.nist.gov/data/blog06.html.

Blog Post Tasks Assessment

We will use the same assessment procedure defined in 2006, and used in 2007. The retrieval unit is documents from the permalink component of the Blog06 test collection. The content of a blog post is defined as the content of the post itself and the contents of all comments to the post: if the relevant content is in a comment, then the permalink is declared to be relevant. Note that blogs and non-blogs will be treated equally in this task.

The following scale will be used for the assessment:

 *[-1] i.e. Not judged.  The content of the post was not
    examined due to offensive URL or header (such documents do exist
    in the collection due to spam).  Although the content itself was not assessed,
    it is very likely, given the offensive header, that the post is
    irrelevant.

 *[0] i.e. Not relevant.  The post and its comments were
    examined, and does not contain any information about the target,
    or refers to it only in passing.

 *[1] i.e. Relevant.  The post or its comments contain
    information about the target, but do not express an opinion
    towards it.  To be assessed as ``Relevant", the information given
    about the target should be substantial enough to be included in a
    report compiled about this entity.

If the post or its comments are not only on target, but also contain an explicit expression of opinion or sentiment about the target, showing some personal attitude of the writer(s), then judge the document using the labels below.

 *[2] i.e. Relevant, negative opinions. The post contains an explicit expression of opinion or sentiment about the target, showing some personal attitude of the writer(s), and the opinion expressed is explicitly negative about, or against, the target.

 *[3] i.e. Relevant, mixed positive and negative opinions. Same as [2], but contains both positive and negative opinions.

 *[4] i.e. Relevant, positive opinion. Same as [2], but the opinion expressed is explicitly positive about, or supporting, the target.
Evaluation

Number of new test targets will be 50 (topics numbered 1001-1050), however the topic set will include the 100 queries from TREC 2006 and TREC 2007 opinion finding tasks (i.e. 150 topics total). For all blog retrieval post tasks, evaluation measures will be precision/recall based, where the actual "most important measure" will be MAP.

Blog Post Retrieval Task Submissions

Baseline Adhoc Retrieval Task

You can submit up to 2 topic-relevance baselines using all 150 topics. One baseline must be an automatic, title-only run.

For the opinion task, the usual trec_eval format will be used. The submission file should contain lines of the format

  topic Q0 docno rank sim runtag

where

Opinion Finding Task

For the opinion finding task, you are permitted to submit up to 4 runs which must be based on your two previously submitted baseline task runs. These will use all 150 topics. One of these 4 submitted runs must be an automatic, title-only run. If your system cannot be clearly broken down into baseline and opinion-finding features, then you can select "N/A" as the baseline run (though this will make it difficult for you to see the advantage of your opinion finding features).

In addition, TREC will provide 5 standard topic-relevance baseline runs previously submitted. You should attempt to apply your opinion-finding techniques to as many of these 5 baseline runs as possible. This will enable you to have different baselines, and to assess the impact of your opinion-finding technique independently of your initial baseline, hence determining the validity of your conclusions. Each of the provided 5 standard baseline runs could be seen as a black box engine, which has returned a list of relevant documents when all opinion finding features have been switched off.

You MAY submit up to 4 runs for EACH of the provided five standard baselines. Please make it clear in your run descriptions how to compare these runs to the runs based on your own baselines (e.g., "FooRun3's opinion reranker, using baseline2." - FooRun3 being a run that uses your own baseline).

Since you may submit up to 24 runs in total, please take care how you assign priorities for pooling. Please only give high priority to the runs you really want pooled (3 runs at most), and lower priority to others.

Run formats are as for the baseline adhoc retrieval task.

Polarity Task

For the opinion finding task, you are permitted to submit up to 2 runs to the polarity task, which must be based on your two previously submitted baseline runs. These will use all 150 topics. One of these 2 submitted runs must be an automatic, title-only run. If your system cannot be clearly broken down into baseline and polarity-detection features, then you can select "N/A" as the baseline run (though this will make it difficult for you to see the advantage of your polarity-detection features).

In addition, TREC will provide the same 5 standard baseline runs as for the opinion finding task. You should attempt to apply your polarity detection techniques to as many of these 5 baseline runs as possible. Again, each of the provided 5 standard baseline runs could be seen as a black box engine, which has returned a list of relevant documents when all opinion finding features have been switched off.

You MAY submit up to 2 runs for EACH of the provided five standard baselines. Please make it clear in your run descriptions how to compare these runs to the runs based on your own baselines (e.g., "FooRunPol3's Polarity reranker, using baseline2." - FooRunPol3 being a run that uses your own baseline).

Since you may submit up to 12 runs in total, please take care how you assign priorities for pooling. Please only give high priority to the runs you really want pooled (3 runs at most), and lower priority to others.

As described above, the run formats for the polarity task are as for the baseline adhoc and opinion finding tasks, except that each run file contain the output for both positive opinionated posts and negative opinionated posts, separated by a blank line.

Blog Distillation (Feed Search) Task

Blog search users often wish to identify blogs about a given topic, which they can subscribe to and read on a regular basis. This user task is most often manifested in two scenarios:

In the TREC Blog track, we have been investigating the latter scenario – blog distillation. The blog distillation task can be summarised as Find me a blog with a principle, recurring interest in X. For a given area X, systems should suggest feeds that are principally devoted to X over the timespan of the feed, and would be recommended to subscribe to as an interesting feed about the X (ie a user may be interested in adding it to their RSS reader).

This task is particularly interesting for the following reasons:

While the definition of blog Distillation as explained above is different, the idea is to provide the users with the key blogs about a given topic. Note that point (iii) is not applicable in a blog setting.

Operationality

Like in TREC 2007, the blog distillation task will follow the community judging model, where the topics will be proposed and judged by the participating groups.

Proposed assessment guidelines at TREC-BLOG/BlogDistillationAssessmentGuidelines.

Topic Development Phase

We need each participating group to create 6 topics for this task. Your aim is to identify some topics 'X', and a few (e.g. 2 or 3) relevant feeds (identified by their feedno). Your topic area should be specific enough that there are not likely to be hundreds or thousands of relevant feeds (so 'cars' is probably too vague a topic). Once you have decided on your topics, send them in an email with two or three relevant feeds to ian.soboroff (AT SYMBOL) nist.gov PLEASE DO NOT POST THEM TO THE MAILING LIST, OR TO THE ORGANISERS LIST.

Format:

<top>
<title> a short query-ish title </title>

<desc> Description:
The desc is a sentence-length description of what you are looking for, and should include the title words.
</desc>

<narr> Narrative:
The narr is a paragraph-length description of what you are looking for.  Use it to give details on what feeds or blogs are relevant and what feeds or blogs are not.  If there are "gray areas", state them here.
</narr>

<feeds>
feedno
feedno
feedno
</feeds>

<comments>
Anything else you want to say.
</comments>

</top>

When choosing topics, please avoid topics that are too general, and will lead to many many relevant feeds being identified. For example, the topic 'Linux' is too general for this collection. A better topic might be 'linux filesystems'. Also avoid topics with temporal aspects - for example, Christmas is an external event and the blogosphere will likely react to this. However, very few blogs will have a recurring interest in that event.

Example:

<top>
<title> solaris </title>

<desc> Description:
Blogs describing experiences administrating the Solaris operating system, or its new features or developments.
</desc>

<narr> Narrative:
Relevant blogs will post regularly about administrating or using the Solaris operating system from Sun, its latest features or developments. Blogs with posts about Solaris the movie are not relevant, not are blogs which only have a few posts Solaris.</narr>

<feeds>
*BLOG06-feed-053948 BLOG06-feed-078402 BLOG06-feed-018020* </feeds>

<comments>
None.
</comments>

</top>

All topics should be submitted to Ian by end of Monday 23rd June.

Topic Development System

To help the participating groups in creating their blog distillation topics, we have provided a standard search system for *documents* on the Blog06 collection, but it also displays the feeds for each documents, and moreover, you can view all the documents for a given feed. You can access it from: [WWW] [WWW] http://ir.dcs.gla.ac.uk/terrier/search_blogs06/ the engine will be taken down once the topic development phase ends

If you have your own search system for the Blogs06 collection (say, from last year's track), feel free to use that.

You don't need to state all the relevant feeds for a topic, as there will be a separate assessment phase in September, after all runs have been submitted.

Evaluation

Participants can submit up to 4 runs to the blog distillation task. As usual, at least one automatic, title-only run is required. Each submitted run should have feeds ranked by their likelihood of having an principle (recurring) interest in the topic. For each submitted run, we ask that up to 100 feeds are returned per topic.

The submitted runs to the blog distillation task will follow the usual trec_eval format, i.e.

  topic Q0 feedno rank sim runtag

where

  topic is the topic number
  Q0 is a literal "Q0"  (a historical field...)
  feedno is the feed number of the blog (BLOG06-feed-......)
  rank is the rank at which the system returned the document (1 .. n)
  sim is the system's similarity score
  runtag is the run's identifier string

Evaluation is likely to be by Mean Average Precision. In order to identify the key blogs, there may be multiple levels of relevance during assessments (i.e. highly relevant, key, relevant, non-relevant), allowing measures such as NDCG to be used.

Track Timeline

last edited 2009-03-04 21:18:20 by IadhOunis