The May Institute: An Intersection Between Data Science and Life Science

July 25, 2018

by Aditi Peyush

Northeastern University’s College of Computer and Information Science hosted the fourth annual May Institute on Computation and Statistics for Mass spectrometry and Proteomics on April 30th through May 11th. The May Institute provides statistical and computational background to individuals in the life sciences field.

To Dr. Olga Vitek of Northeastern University, an event of this scope is a necessary step for integrating computational and life sciences, and for making a lasting impact. The event combined lectures, keynote speakers, and practical training, to gear the information towards the participants’ backgrounds and interests. “I think that currently many disciplines, in particular, life sciences do not provide enough training in computer science, data science and statistics” Dr. Vitek commented. The May Institute offers an introduction to statistical methods, algorithms, and data analysis methods that are specifically adjusted towards this audience. The speakers and the presenters strive to make their examples relatable to the participants’ daily work.

The audience included individuals from academic institutions and industry professionals, who study biological systems and use mass spectrometry or other similar technologies to study the activities of these systems at the molecular level. Employees from companies that manufacture mass spectrometers also attended to understand how their data is analyzed. “The common theme was that these are participants who rely on these technologies to generate and interpret their own data,” said Vitek. There has also been an increasing number of registrations from computer scientists who want to see how their training can contribute to problems in life sciences fields.

Statistician by training, Dr. Olga Vitek was the coordinator of the event. “I think that statistics is key for any scientific area which has to work with data…so pretty much any scientific area,” said Dr. Vitek, who has spent the past few years collaborating with faculty from Northeastern University, ETH Zurich, and The University of Washington. She explained that the idea for the May Institute was born when she found that many of her colleagues had the right background to work with the data but needed more training. “We decided we’d offer some short courses in association with conferences, but soon realized that you need more time to really dive into the details.” After receiving NIH funding, the two-week May Institute Event was born. With the support of professors from University of Washington, University of Seattle, Harvard, ETH Zurich, and scientists from RStudio, the event has become a major source of collaboration.

A keynote speaker at the May Institute, Dr. Ruedi Aebersold, is a professor specializing in proteomics at ETH Zurich and a recent recipient of the Karger medal from Northeastern University. Studying the  functioning of cells and organisms, the Aebersold lab is a pioneer in the field of proteomics. Dr. Aebersold felt that the event served a critical purpose of alleviating errors associated with research studies that can alter the results of the experiment. “Life science research has been transformed over the last one to two decades by the development of data driven research approaches, exemplified by genomics,” said Dr. Aebersold, “These approaches generate enormous data volumes which cannot be analyzed without computing and statistics.” Dr. Aebersold tailored his lectures towards this audience and worked to relate the computational tools to life sciences research.

Both Dr. Vitek and Dr. Aebersold agree that this event fostered teamwork through inquiry. “We also have to realize that the contents of the course taught has to be considered research in progress, and neither our understanding of the problems nor the available tools are perfect,” commented Dr. Aebersold. Dr. Vitek enthusiastically described a session in which participants brought their own data sets and worked on solving them with the methods and tools that they learned as part of the program. The fundamental takeaway from this event is that statisticians and computer scientists should be equal partners in life sciences investigations, in order to make strides in this area of research.

Dr. Vitek would like to thank Northeastern University for allowing her to work at the interphase of different research fields, in particular College of Computer and Information Science and College of Science, as well as her research group members and collaborators.