Skip to main content

Sarthak Jain

PhD Student


Office Location

440 Huntington Avenue
208 West Village H
Boston, MA 02115


  • BT in Computer Engineering, Delhi Technological University – India

About Me

  • Hometown: Delhi, India
  • Field of Study: Natural Language Processing, Machine Learning
  • PhD Advisor: Byron Wallace


Sarthak Jain is a PhD student in the natural language processing and machine learning programs at Northeastern University, advised by Professor Byron Wallace. Jain’s research is focused on using machine learning and deep learning based methods to solve the challenges and answer the queries posed by healthcare providers when encountering complex medical literature. Jain analyzes documents found in the medical domain to gather accurate and useful information for professional use and implementation. Prior to joining Khoury College of Computer Sciences, Jain earned his Bachelor’s of Technology degree from Delhi Technological University.

What are the specifics of your graduate education (thus far)?

I earned my undergraduate degree in Computer Engineering at Delhi Technological University.

What are your research interests?

My research interests mainly align with solving challenges arising from automated parsing of large amounts of knowledge grounded in human language. Considering the complexity of language, its ambiguity and uncertainty, I want to solve the problems arising from extracting, representing and generalizing such knowledge using machine learning and deep learning based methods.

What’s one problem you’d like to solve with your research/work?

My main focus is on analysis of documents generated in the medical domain. My goal is to generate a system to assist healthcare providers in review of existing literature and collate information from a large body of clinical text to answer very specific queries posed by them with high accuracy and certainty.

What aspect of what you do is most interesting?

The single most important and interesting aspect of my work is the fact that in a field like medicine, you need to provide theoretical guarantees regarding the efficacy of your systems, whereas most machine learning systems are in form of a black box whose inner workings rarely provide any guarantees. If we want these systems to be adopted on a wider scale, we need to open up these black boxes and make them easier to interpret by end users.

What are your research or career goals, going forward?

My plan is to work with people from both academia and industry to further the use of NLP techniques for knowledge extraction and representation in critical domains like medicine.