Optimal Allocation of Crowdsourced Resources for IR Evaluation
||Javed A. Aslam
Evaluating the performance of information retrieval systems such as
search engines is critical to their effective development. Current
"gold standard" performance evaluation methodologies generally rely on
the use of expert assessors to judge the quality of documents or web
pages retrieved by search engines, at great cost in time and
expense. The advent of "crowd sourcing," such as available through
Amazon's Mechanical Turk service, holds out the promise that these
performance evaluations can be performed more rapidly and at far less
cost through the use of many (though generally less skilled) "crowd
workers"; however, the quality of the resulting performance
evaluations generally suffer greatly. The thesis of this project is
that one can obtain the best of both worlds---performance evaluations
with the quality of experts but at the cost of crowd workers---by
optimally leveraging both experts and crowd workers in asking the
"right" assessor the "right" question at the "right" time. For
example, one might ask inexpensive crowd workers what are likely to be
"easy" questions while reserving what are likely to be "hard"
questions for the expensive experts. While the project focuses on the
performance evaluation of search engines as its use case, the
techniques developed will be more broadly applicable to many domains
where one wishes to efficiently and effectively harness experts and
crowd workers with disparate levels of cost and expertise.
To enable the vision described above, a probabilistic framework will
be developed within which one can quantify the uncertainty about a
performance evaluation as well as the cost and expected utility of
asking any assessor (expert or crowd worker) any question (e.g. a
nominal judgment for a document or a preference judgment between two
documents) at any time. The goal is then to ask the "right" question
of the "right" assessor at any time in order to maximize the expected
utility gained per unit cost incurred and then to optimally aggregate
such responses in order to efficiently and effectively evaluate
- Javed A. Aslam (PI)
- Virgil Pavlu (Research Scientist)
- Matt Ekstrand-Abueg (PhD student)
- Maryam Bashir (PhD student)
- Pavel Metrikov (PhD student)
- Jesse Anderton (PhD student)
- Cheng Li (PhD student)
- Bingyu Wang (PhD student)
A Comprehensive Method for Automating Test Collection Creation and Evaluation for Retrieval and Summarization Systems
PhD Thesis, College of Computer and Information Science, Northeastern University, 2017.
A Study of Realtime Summarization Metrics
In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pages 2125-2130. ACM Press, 2016.
An Empirical Study of Skip-gram Features and Regularization for Learning on Sentiment Analysis
In Advances in Information Retrieval: 38th European Conference on IR Research (ECIR), pages 72-87. Lecture Notes in Computer Science, Vol. 9626. Springer, March 2016.
Relevance Assessment (Un-)Reliability in Information Retrieval: Minimizing Negative Impact
PhD Thesis, College of Computer and Information Science, Northeastern University, 2016.
TREC 2015 Temporal Summarization Track Overview
In Proceedings of the The Twenty-Fourth Text REtrieval Conference, NIST Special Publication:
SP 500-319, 2015.
Aggregation of Crowdsourced Ordinal Assessments and Integration with Learning to Rank: A Latent Trait Model
In Proceedings of the 24th ACM International Conference on Information and Knowledge Management (CIKM), pages 1391-1400. ACM Press, October 2015.
Optimally Selecting and Combining Assessment and Assessor Types for Information Retrieval Evaluation
PhD Thesis, College of Computer and Information Science, Northeastern University, 2015.
TREC 2014 Temporal Summarization Track Overview
In Proceedings of the The Twenty-Third Text REtrieval Conference, NIST Special Publication:
SP 500-308, 2014.
TREC Temporal Summarization Track
Acknowledgment and Disclaimer
This material is based upon work supported by the National Science
Foundation under Grant No. IIS-1421399 . Any opinions, findings and
conclusions or recommendations expressed in this material are those of
the author(s) and do not necessarily reflect the views of the National
Science Foundation (NSF).