440 Huntington Avenue
310G West Village H
Boston, MA 02115
ATTN: Kylie Bemis, 202 WVH
360 Huntington Avenue
Boston, MA 02115
- Statistical computing environments
- Methods for big, complex data, especially datasets with non-trivial correlation structures or that integrate data from multiple sources
- PhD in statistics, Purdue University
- MS in applied statistics, Purdue University
- BS in statistics and mathematics, Purdue University
Kylie Bemis is a lecturer in the Khoury College of Computer Sciences at Northeastern University. In 2013, she interned at the Canary Center for Cancer Early Detection at Stanford University, where she developed the Cardinal software package for statistical analysis of mass spectrometry imaging experiments. In 2015, she was awarded the John M. Chambers Statistical Software Award by the American Statistical Association for her work on Cardinal.
In 2016, she joined Olga Vitek’s lab, the Statistical Methods for Studies of Biomolecular Systems, as a postdoctoral fellow. In 2019, she joined Northeastern as faculty, where she now teaches data science and develops curriculum for the master’s in data science program.
While at Purdue University, Kylie served as president of the Purdue chapter of the American Indian Science and Engineering Society and secretary of the Native American Student Association. She is active in outreach to the Native American and LGBTQ+ communities. She is an enrolled member of the Zuni tribe, and her hobbies include writing fiction and poetry.
ABI Innovation: Scalable and Agile Analysis of Mass Spectrometry Experiments
ABI Innovation: Scalable and Agile Analysis of Mass Spectrometry Experiments
Mass spectrometry is a diverse and versatile technology for high-throughput functional characterization of proteins, small molecules and metabolites in complex biological mixtures. The technology rapidly evolves and generates datasets of an increasingly large complexity and size. This rapid evolution must be matched by an equally fast evolution of statistical methods and tools developed for analysis of these data. Ideally, new statistical methods should leverage the rich resources available from over 12,000 packages implemented in the R programming language and its Bioconductor project. However, technological limitations now hinder their adoption for mass spectrometric research. In response, the project ROCKET builds an enabling technology for working with large mass spectrometric datasets in R, and rapidly developing new algorithms, while benefiting from advancements in other areas of science. It also offers an opportunity of recruitment and retention of Native American students to work with R-based technology and research, and helps prepare them in a career in STEM.
Instead of implementing yet another data processing pipeline, ROCKET builds an enabling technology for extending the scalability of R, and streamlining manipulations of large files in complex formats. First, to address the diversity of the mass spectrometric community, ROCKET supports scaling down analyses (i.e., working with large data files on relatively inexpensive hardware without fully loading them into memory), as well as scaling up (i.e., executing a workflow on a cloud or on a multiprocessor). Second, ROCKET generates an efficient mixture of R and target code which is compiled in the background for the particular deployment platform. By ensuring compatibility with mass spectrometry-specific open data storage standards, supporting multiple hardware scenarios, and generating optimized code, ROCKET enables the development of general analytical methods. Therefore, ROCKET aims to democratize access to R-based data analysis for a broader community of life scientists, and create a blueprint for a new paradigm for R-based computing with large datasets. The outcomes of the project will be documented and made publicly available at https://olga-vitek-lab.khoury.northeastern.edu/.
This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.
Protein biomarkers on tissue as imaged via MALDI mass spectrometry: A systematic approach to study the limits of detection.
van de Ven, S. M. W. Y., Bemis, K. D., Lau, K., Adusumilli, R., Kota, U., Stolowitz, M., et al. (2016). Protein biomarkers on tissue as imaged via MALDI mass spectrometry: A systematic approach to study the limits of detection. Proteomics, 16(11-12), 1660–1669. http://doi.org/10.1002/pmic.201500515
MALDI mass spectrometry imaging (MSI) is emerging as a tool for protein and peptide imaging across tissue sections. Despite extensive study, there does not yet exist a baseline study evaluating the potential capabilities for this technique to detect diverse proteins in tissue sections. In this study, we developed a systematic approach for characterizing MALDI-MSI workflows in terms of limits of detection, coefficients of variation, spatial resolution, and the identification of endogenous tissue proteins. Our goal was to quantify these figures of merit for a number of different proteins and peptides, in order to gain more insight in the feasibility of protein biomarker discovery efforts using this technique. Control proteins and peptides were deposited in serial dilutions on thinly sectioned mouse xenograft tissue. Using our experimental setup, coefficients of variation were <30% on tissue sections and spatial resolution was 200 μm (or greater). Limits of detection for proteins and peptides on tissue were in the micromolar to millimolar range. Protein identification was only possible for proteins present in high abundance in the tissue. These results provide a baseline for the application of MALDI-MSI towards the discovery of new candidate biomarkers and a new benchmarking strategy that can be used for comparing diverse MALDI-MSI workflows.
Statistical detection of differentially abundant ions in mass spectrometry-based imaging experiments with complex designs
K. A. Bemis, D. Guo, A. Harry, M. Thomas, I. Lanekoff, M. Stenzel-Poore, S. Stevens, J. Laskin, and O. Vitek. “Statistical detection of differentially abundant ions in mass spectrometry-based imaging experiments with complex designs.” International Journal of Mass Spectrometry. 2019
Mass Spectrometry Imaging (MSI) characterizes changes in chemical composition between regions of biological samples such as tissues. One goal of statistical analysis of MSI experiments is class comparison, i.e. determining analytes that change in abundance between conditions more systematically than as expected by random variation. To reach accurate and reproducible conclusions, statistical analysis must appropriately reflect the initial research question, the design of the MSI experiment, and all the associated sources of variation. This manuscript highlights the importance of following these general statistical principles. Using the example of two case studies with complex experimental designs, and with different strategies of data acquisition, we demonstrate the extent to which choices made at key points of this workflow impact the results, and provide suggestions for appropriate design and analysis of MSI experiments that aim at detecting differentially abundant analytes.
D. Guo, K. A. Bemis, C. Rawlins, J. Agar, and O. Vitek. “Unsupervised segmentation of mass spectrometric ion images characterizes morphology of tissues.” Bioinformatics. 2019
Mass spectrometry imaging (MSI) characterizes the spatial distribution of ions in complex biological samples such as tissues. Since many tissues have complex morphology, treatments and conditions often affect the spatial distribution of the ions in morphology-specific ways. Evaluating the selectivity and the specificity of ion localization and regulation across morphology types is biologically important. However, MSI lacks algorithms for segmenting images at both single-ion and spatial resolution.
This article contributes spatial-Dirichlet Gaussian mixture model (DGMM), an algorithm and a workflow for the analyses of MSI experiments, that detects components of single-ion images with homogeneous spatial composition. The approach extends DGMMs to account for the spatial structure of MSI. Evaluations on simulated and experimental datasets with diverse MSI workflows demonstrated that spatial-DGMM accurately segments ion images, and can distinguish ions with homogeneous and heterogeneous spatial distribution. We also demonstrated that the extracted spatial information is useful for downstream analyses, such as detecting morphology-specific ions, finding groups of ions with similar spatial patterns, and detecting changes in chemical composition of tissues between conditions.
Kylie A. Bemis and Olga Vitek
We introduce matter, an R package for direct interactions with larger-than-memory datasets, stored in an arbitrary number of files of any size. matter is primarily designed for datasets in new and rapidly evolving file formats, which may lack extensive software support. matter enables a wide variety of data exploration and manipulation steps and is extensible to many bioinformatics applications. It supports reproducible research by minimizing the need of converting and storing data in multiple formats. We illustrate the performance of matter in conjunction with the Bioconductor package Cardinal for analysis of high-resolution, high-throughput mass spectrometry imaging experiments.
K. D. Bemis, A. Harry, L. S. Eberlin, C. Ferreira, S. M. van de Ven, P. Mallick, M. Stolowitz, O. Vitek. “Cardinal: an R package for statistical analysis of mass spectrometry-based imaging experiments”. Bioinformatics, 31:2418, 2015.
Cardinal is an R package for statistical analysis of mass spectrometry-based imaging (MSI) experiments of biological samples such as tissues. Cardinal supports both Matrix-Assisted Laser Desorption/Ionization (MALDI) and Desorption Electrospray Ionization-based MSI workflows, and experiments with multiple tissues and complex designs. The main analytical functionalities include (1) image segmentation, which partitions a tissue into regions of homogeneous chemical composition, selects the number of segments and the subset of informative ions, and characterizes the associated uncertainty and (2) image classification, which assigns locations on the tissue to pre-defined classes, selects the subset of informative ions, and estimates the resulting classification error by (cross-) validation. The statistical methods are based on mixture modeling and regularization.
Probabilistic Segmentation of Mass Spectrometry (MS) Images Helps Select Important Ions and Characterize Confidence in the Resulting Segments
Kyle D. Bemis, April Harry, Livia S. Eberlin, Christina R. Ferreira, Stephanie M. van de Ven, Parag Mallick, Mark Stolowitz and Olga Vitek
Mass spectrometry imaging is a powerful tool for investigating the spatial distribution of chemical compounds in a biological sample such as tissue. Two common goals of these experiments are unsupervised segmentation of images into newly discovered homogeneous segments and supervised classification of images into predefined classes. In both cases, the important secondary goals are to characterize the uncertainty associated with the segmentation and with the classification and to characterize the spectral features that define each segment or class. Recent analysis methods have focused on the spatial structure of the data to improve results. However, they either do not address these secondary goals or do this with separate post hoc procedures.
We introduce spatial shrunken centroids, a statistical model-based framework for both supervised classification and unsupervised segmentation. It takes as input sets of previously detected, aligned, quantified, and normalized spectral features and expresses both spatial and multivariate nature of the data using probabilistic modeling. It selects informative subsets of spectral features that define each unsupervised segment or supervised class and quantifies and visualizes the uncertainty in spatial segmentations and in tissue classification. In the unsupervised setting, it also guides the choice of an appropriate number of segments. We demonstrate the usefulness of this framework in a supervised human renal cell carcinoma experimental dataset and several unsupervised experimental datasets, including a pig fetus cross-section, three rodent brains, and a controlled image with known ground truth. This framework is available for use within the open-source R package Cardinal as part of a full pipeline for the processing, visualization, and statistical analysis of mass spectrometry imaging experiments.