Northeastern University has received a $500,000 grant from the Andrew W. Mellon Foundation to develop the Proteus toolset for information retrieval and visualization. Building on their existing strengths in information extraction, retrieval, and visualization, Northeastern University’s NULab for Texts, Maps, and Networks and collaborators at the University of Massachusetts Amherst’s Center for Intelligent Information Retrieval will build software tools to help researchers in the digital humanities to explore the contents of large, unstructured collections of historical books, newspapers, and other documents.
The initial work in many research projects goes toward forming a corpus of relevant documents. Although scholars today have access to an unprecedented amount of source material from mass digitization projects by Google, the Internet Archive, the Library of Congress, and others, a single subject heading or search-engine query in these archives is unlikely to capture all of the materials relevant to a long-term scholarly research effort. Users of the Proteus system will interactively and incrementally build up collections by analyzing networks of text reuse among books, passages, authors, and journals; providing feedback on terms, phrases, named entities, and metadata; and exploring these growing collections with the interactive Bookworm full-text visualization tool.
The NULab for Texts, Maps, and Networks is Northeastern University’s center for faculty and student research in the digital humanities and computational social sciences. Professors David Smith, Ryan Cordell, Elizabeth Maddock Dillon, and Benjamin Schmidt are the principal investigators for this project.
The Center for Intelligent Information Retrieval, in the University of Massachusetts Amherst’s School of Computer Science, is one of the leading research groups working in the areas of information retrieval and information extraction. Professor James Allan is the principal investigator for this project.