Terrier/WikiForum

We've decided to concentrate on the wiki for the community participation with Terrier. You can also discuss Terrier on the Terrier/LiveDoc/MailingLists

Question: Why do we need to rebuild the collection.spec file correctly (see Terrier/LiveDoc/TrecExample)? Is the file collection.spec created with the script trec_setup.sh?

From: GianniAmati

VassilisPlachouras: The trec_setup.sh script uses the utility find on Unix/Linux/MacOS X systems and the trec_setup.bat script uses the class [WWW] FileFind, on Windows, so that we obtain the absolute paths for all the files under a given directory.

If under the directory you specify for the trec_setup script, there are only the collection files, then it is not necessary to create the collection.spec file manually. If under the directory where the collection files are stored, there are other files as well, then it may be necessary to check that the automatically generated collection.spec file contains only the collection files, and either edit it or create it manually.

Have someone used the distributed version of Terrier with the terabyte TREC track to compare efficiency between a language model approach and the DFR models, or conventional models?

More information: I have not the data of the terabyte track and I would like to know whether the assignment of non-zero probabilities to terms increase the complexity of the retrieval, especially with long queries or QE.

From: GianniAmati

Question: Terabyte Track

More information: I am browsing the new versions of the papers of the terabyte track. FUB did not participate to this track. I have not yet the results. However I knew that Terrier was first both on long and short queries. I saw that there is a group which is claiming that they were first on title-only runs. Also I saw somebody complaining that results are not yet available. It seems that a resuming official table from organizers is still missing. Could somebody from GU clear this point?

From: GianniAmati

IadhOunis: The Terabyte Track overview paper is now publicly available in the TREC Conference 2004 Proceedings. Terrier achieved the two best overall *official* Terabyte Track runs (Whether MAP, R-Prec or P@10 measures are considered). Both runs used long queries. For short queries, i.e. title only, Terrier run was second, although in an additional run, we improved upon the obtained performance. However,, the best title-only run did not achieve a better P@10 or R-prec, if the evaluation measures are Web related. The results were made available/distributed well before the final proceedings deadline.

Question: Terrier Desktop Query language

Do you know that you can use pseudo-query expansion in the query language of Terrier? For example, you can write:

"Web Information Retrieval" qe:on

The above will automatically expand your query "Web Information Retrieval" with related terms. The overhead of the query expansion is marginal.

You can also use parentheses, for example:

+Retrieval +DFR -(Web "data mining")

The above will exclude documents containing 'Web' and the noun group "data mining" from results.

From: IadhOunis

Template for Questions

Question: goes here

More information: goes gere From: goes here

Question: goes here

More information: goes gere From: goes here


CategoryTerrier