CS6200/IS4200: Information Retrieval

Fall 2019

This course provides an overview of the important issues in information retrieval, and how those issues affect the design and implementation of search engine software. The course emphasizes the technology used in Web search engines and the information retrieval theories and concepts that underlie all search applications. Mathematical experience including basic probability is strongly desirable.

Instructor: David Smith, Associate Professor in Computer Science (Office Hours: Fridays 12–2 or by appointment; WVH 356)

Teaching assistants (office hours and rooms TBA):

Class meeting: Tuesdays and Fridays, 9:50 – 11:30 a.m., Shillman 105

Other information, lecture notes, and questions are available on the Piazza discussion board.

Course Texts

There is no single required text for the course; however, we will refer to two different texts, and we strongly suggest following the readings from one or both of the following:


Assignments will be posted here.


This schedule is subject to change. Check back as the class progresses or consult the lecture notes on Piazza.

CMS refers to Search Engines by Croft, Metzler, and Strohman; MRS refers to Introduction to Information Retrieval by Manning, Raghavan, and Schütze.

  1. Overview of Information Retrieval (6 Sept. 2019)
  2. Architecture of a Search Engine
  3. Acquiring Data
  4. Processing Text
  5. Ranking with Indexes
  6. Queries and Interfaces
  7. Retrieval Models
  8. Evaluating Search Engines
  9. Classification and Clustering
  10. Networks of People and Search Engines
  11. Possible Further Topics
  12. Review

Course Policies


There will be a midterm and a final examination. The midterm will be administered in class (date TBA), will require about one hour, and will constitute 20% of the course grade. The final will be administered on Friday, 6 December, and constitute 20% of the course grade. Some questions will differ between CS6200 and IS4200.


There will be four assignments, each making up 10% of the course grade. All assignments will be equally weighted. Instructions and due dates will be posted on the course website as they are assigned. Some of the problems will be difficult, and it will often be helpful to discuss them with others. Feel free to form study groups; however, the idea is for everyone to understand the problems and experience working through the solutions, so you may not simply “give” a solution to (or copy a solution from) another classmate. In general, each student must write up his or her own code and homework solutions and must not read or copy the solutions of others. If you work with others on a problem, you must note with whom you discussed the problem at the beginning of your solution write-up.

The homeworks will consist mostly of programming exercises to implement various components of a search engine. We will usually also ask for your output on certain datasets and a short report describing your design choices and experimental results. Some questions will differ between CS6200 and IS4200.

There will be one course project, making up 20% of the course grade. The project will be designed for working in teams. The due date, to be announced, will be about one week before the last class.

Late policy: Assignments are due at the the announced due date and time, usually 11:59 p.m. You will be granted one homework extension of four calendar days, to be used at your discretion, without having to ask. This single extension is meant to smooth over unforeseen crunches in your schedule, and you cannot simply distribute the four late days among four assignments. After the first late assignment, unexcused late assignments will be penalized 20% per calendar day late. We normally will not accept assignments after the date on which the following assignment is due or after the solutions have been handed out, whichever comes first. If you know in advance of circumstances that would cause you to turn in an assignment late, please contact the instructor before the assignment is due to ask if an extension is possible.

Academic Integrity

All work submitted for credit must be your own.

You may discuss the homework problems or projects with your classmates, the TAs, and the instructor. You must acknowledge the people with whom you discussed your work, and you must write up your own code and solutions.

Any written sources used (apart from the text) must also be acknowledged; however, you may not consult any solutions from previous years' assignments whether they are student- or faculty-generated.

Accomodations for Students with Disabilities

If you have a disability-related need for reasonable academic accommodations in this course and have not yet met with a Disability Specialist, please visit www.northeastern.edu/drc and follow the outlined procedure to request services.

If the Disability Resource Center has formally approved you for an academic accommodation in this class, please present the instructor with your “Professor Notification Letter” during the first two weeks of the semester, so that we can address your specific needs as early as possible.