Our method of query simulation [He & Ounis, ECIR2005] is inspired by the query-based sampling (Callan & Connell, 2001). The difference between the two approaches is that our method adopts a term weighting model to extract the most informative terms from the top-ranked documents to formulate a query, while the query-based sampling approach uses the top-ranked documents to get various collection samples. Our query simulation method can be described as follows:
Randomly choose a seed-term from the vocabulary.
Extract the X-1 most informative terms from the Y top-ranked documents using a specific term weighting model. Y is a parameter of the query simulation method. At this stage, we can use any term weighting model from the literature, e.g. the Bo1 DFR term-weighting model.
To avoid selecting a junk term as the seed-term, we consider the most informative one of the extracted terms in step 3 as the new seed-term. Note that the original seed-term is discarded at this stage.
Repeat steps 2 and 3 to extract the X-1 most informative terms from the Y top-ranked documents, which are ranked according to the new seed-term.
The simulated query consists of the new seed-term and the X-1 terms extracted in Step 5.