# CS6200: Information Retrieval

## Homework 3

Assigned: Monday, 28 Ocbtober 2019
Due: Friday, 15 November 2019, 11:59 p.m.

### Problems

1. [10 points] Query transformation (or query rewriting) modifies the original query so that it better expresses the query intent. For each of the following examples, name the transformation and give a brief description what it does:
1. electric car $\rightarrow$ electric car cars
2. facebook lira $\rightarrow$ facebook libra
3. middle east turmoil $\rightarrow$ middle east turmoil syria iraq
4. new york times subscription $\rightarrow$ "new york times" subscription
5. tapas $\rightarrow$ tapas Cambridge Massachusetts
2. [15 points] Language modeling
1. What is the purpose of smoothing in the query likelihood retrieval model?
2. The likelihood of a query $Q$ with Jelinek-Mercer smoothing is: $p(Q \mid D) = \prod_{i=1}^n [(1 - \lambda) \frac{f_{q_i,D}}{|D|} + \lambda \frac{c_{q_i}}{|C|}]$ Explain what types of documents tend to be retrieved when the $\lambda$ parameter for the corpus model is close to 0. Also, how would the ranking change when the lambda parameter approaches 1?
3. Estimating a relevance model provides a formal justification for pseudo-relevance feedback. Explain pseudo-relevance feedback and what happens to the query when using this technique.
3. [10 points] Term dependence
1. Two popular retrieval models with term dependence are the sequential dependence model and the full dependence model. For the query “russian president vladimir putin”, what features does the full dependence model add that do not occur in the sequential dependence model?
2. In naïve Bayes classification models, as in BM25 and unigram retrieval models, terms are independent of each other given the documents classification. Say that we want to add bigram features to a naïve Bayes classifier for positive vs. negative movie reviews. In addition to features like, ‘martin’ and ‘charlie’ and ‘sheen’, we would also have features like ‘martin sheen’ and ‘charlie sheen’. How could these features help improve classification? If the terms were indeed independent given the classification, what can you say about the relationship between p(‘martin’ | ‘sheen’) and p(‘martin’) in the collection of, say, positive reviews?
4. [15 points] Evaluation
1. The web search industry uses so-called PEGFB relevance judgments—perfect, excellent, good, fair, and bad—together with the NDCG metric to evaluate search output. Why is this metric well suited to evaluating web search using this type of judgments?
2. What is a disadvantage in using mean average precision (MAP) to evaluate web searches? Explain how retrieving relevant documents at low ranks (i.e., a long way down the ranked list) will decrease MAP.
3. Assume that you are evaluating search engines A and B with MAP. Describe how you would use some particular significance test to describe the relative performance of A and B.