CS7180: Special Topics in AI: Text Modeling for the Humanities and Social Sciences

Fall 2017

Class meeting: Tuesdays, 11:45-1:25, and Thursdays, 2:50-4:30, Ryder 124

Instructor: David Smith, Assistant Professor in Computer and Information Science (Office Hours: Thursdays, 12-2, or by appointment; WVH 356)

Course Description

Researchers and archivists have been digitizing the source materials for human history and culture for over half a century, but two further factors sped the emergence in the last decade of the digital humanities and computational social sciences. First, industrial scale scanning projects have increased the available evidence beyond the ability of individual scholars to manage them; second, born-digital traces of our social, cultural, economic, and political lives have become practically archivable and searchable on a massive scale. Much of this data is text—“unstructured” as the database people might say—providing opportunities for advances in natural language processing.

In this seminar, we will read and discuss papers about building models of text to answer questions in the humanities and social sciences. Students will take turns presenting and leading discussion of papers along with the relevant background material. All students will write short reviews of the papers we read and complete a course project and accompanying report.


There are no official prerequisites; however, it is expected that students have some background either in NLP, in machine learning, or in working with text computationally in the humanities or social sciences.


Each week, we will read roughly two papers on a common theme. The papers could be tied together by methodology—e.g., text categorization or convolutional neural networks—or by subject matter—e.g., criminology or plot analysis.

  1. September 7: Introduction: Human language and culture meet Big Government, Big Business, Big Science, and Big Data. I ended up talking about several books from, modally, 1983:
  2. September 12: Models of Text in the Social Sciences and Humanities
  3. September 14: Bags of words and other text representations
  4. September 19: Word vectors and distributed representations
  5. September 21: Text categorization
  6. September 26: Language models and topic models
  7. September 28: Dynamic models and temporal change
  8. October 3: Entity and relation extraction
  9. October 5: Discuss project ideas
  10. October 10: Plot and character
  11. October 12: Language and power relations
  12. October 17: Geographical and social variation
  13. October 19: Dialogue and argumentation
  14. October 24: Text Reuse
  15. October 26: Information cascades
  16. October 31: Community structure and communication
  17. November 2: Exploratory Data Analysis
  18. November 7: Document analysis and recognition
  19. November 9: Preliminary project presentations
  20. November 14: Laws and legislatures
  21. November 16: Crime and Harm
  22. November 21: Causal Inference
  23. November 23: Thanksgiving: No Class
  24. November 28: Final project presentations
  25. November 30: Final project presentations
  26. December 5: Some other topic
  27. December 7: Interpretation