Stratified Sampling

    Average Precision induced prior averaged over many system lists (top).

    Bucketed prior (second row): Each bucket contains m=14 items (in this example) and it is associated with sum of distribution weights of its items.

    Third: Buckets are sampled with replacement, obtaining counts 7,4,2,0,1,0 (summing to m=14).

    Bottom: Inside each bucket documents are sampled uniformly, without replacement: from first bucket 7 items, from second bucket 4 items, and so on.

 

sampling-based IR evaluation

The problem: large scale IR evaluation

We consider the problem of large-scale retrieval evaluation, with a

focus on the considerable effort required to judge tens of thousands of documents using traditional test collection construction methodologies.

  1. -SIGIR 06

  2. -infAP(CIKM 06)

  3. -new technique

  4. -thesis talk

  5. -SIGIR talk

Evaluation

The latest estimator we use is the generalized ratio estimator, very popular on polls, election strategies, market research etc (Thompson02):






Given a sample S of judged documents along with inclusion probabilities, we discuss here how to estimate quantities of interest (AP, R-precision, Precision at cutoff).   

For AP estimate, which we view as mean of a population of precision values, the generalized ratio estimator for unequal probability designs , becomes:









Results