Welcome to the Applied Machine Learning Group at Northeastern
Professor Brodley’s research group, Applied Machine Learning at the Khoury College of Computer Sciences, focuses on core issues of machine learning, as well as real-world applications of machine learning.
Recent core issues investigated include:
- Active learning
- Conditional random fields
- Constraint-based clustering
- Crowd sourcing in clustering
Recent applications (and collaborators) include:
- Predicting disease course for Multiple Sclerosis patients (Neurology, Harvard Medical School)
- Detecting Focal Cortical Dysplasia in the MRI’s of Epilepsy Patients (Neuroscience and Radiology, NYU Medical School)
- Understanding the predictors of unilaterial vestibular disorders (Mass Eye and Ear)
Administrative Staff Tenured and Tenure Track Faculty
Carla E. Brodley
Dean - Khoury College of Computer Sciences
Removing confounding factors via constraint-based clustering: An application to finding homogeneous groups of multiple sclerosis patients
Jingjing Liu, Carla E. Brodley, Brian C. Healy, Tanuja, Chitnis, Removing confounding factors via constraint-based clustering: An application to finding homogeneous groups of multiple sclerosis patients, Artificial Intelligence In Medicine, 2015
Confounding factors in unsupervised data can lead to undesirable clustering results. For example in medical datasets, age is often a confounding factor in tests designed to judge the severity of a patient’s disease through measures of mobility, eyesight and hearing. In such cases, removing age from each instance will not remove its effect from the data as other features will be correlated with age. Motivated by the need to find homogeneous groups of multiple sclerosis (MS) patients, we apply our approach to remove physician subjectivity from patient data.
We present a method based on constraint-based clustering to remove the impact of such confounding factors. Given knowledge about which feature (or set of features) is a confounding factor, call it F. Our method first partitions the data into b bins: if F is categorical, instances from the same category construct one bin; if F is numeric, then we split bins such that each bin contains instances of similar F value. Thus each instance is assigned to a single bin for factor F. We then remove feature F from each instance for the remaining steps. Next, we cluster the data separately in each bin. Using these clustering results, we generate pair-wise constraints and then run a constraint-based clustering algorithm to produce a final grouping.
In a series of experiments with synthetic datasets, we compare our proposed methods to detrending when one has numeric confounding factors. We apply our method to the Comprehensive Longitudinal Investigation of Multiple Sclerosis at Brigham and Womens Hospital dataset, and find a novel grouping of patients that can help uncover the factors that impact disease progression in MS.
Our method groups data removing the effect of confounding factors without making any assumptions about the form of the influence of these factors on the other features. We identified clusters of MS patients that have clinically recognizable differences. Because patients more likely to progress are found using this approach, our results have the potential to aid physicians in tailoring treatment decisions for MS patients.
Adrian J. Priesol, MD; Mengfei Cao, BA; Carla E. Brodley, PhD; Richard F. Lewis, MD,,Clinical Vestibular Testing Assessed With Machine-Learning Algorithms, JAMA Otolaryngol Head Neck Surg. doi:10.1001/jamaoto.2014.3519, 2015.
Importance: Dizziness and imbalance are common clinical problems, and accurate diagnosis depends on determining whether damage is localized to the peripheral vestibular system. Vestibular testing guides this determination, but the accuracy of the different tests is not known.
Objective: To determine how well each element of the vestibular test battery segregates patients with normal peripheral vestibular function from those with unilateral reductions in vestibular function.
Design, Setting, and Participants: Retrospective analysis of vestibular test batteries in 8080 patients. Clinical medical records were reviewed for a subset of individuals with the reviewers blinded to the vestibular test data.
Interventions: A group of machine-learning classifiers were trained using vestibular test data from persons who were “manually” labeled as having normal vestibular function or unilateral vestibular damage based on a review of their medical records. The optimal trained classifier was then used to categorize patients whose diagnoses were unknown, allowing us to determine the information content of each element of the vestibular test battery.
Main Outcomes and Measures: The information provided by each element of the vestibular test battery to segregate individuals with normal vestibular function from those with unilateral vestibular damage.
Results: The time constant calculated from the rotational test ranked first in information content, and measures that were related physiologically to the rotational time constant were 10 of the top 12 highest-ranked variables. The caloric canal paresis ranked eighth, and the other elements of the test battery provided minimal additional information. The sensitivity of the rotational time constant was 77.2%, and the sensitivity of the caloric canal paresis was 59.6%; the specificity of the rotational time constant was 89.0%, and the specificity of the caloric canal paresis was 64.9%. The diagnostic accuracy of the vestibular test battery increased from 72.4% to 93.4% when the data were analyzed with the optimal machine-learning classifier.
Conclusions and Relevance: Rotational testing should be considered the primary test to diagnose unilateral peripheral vestibular damage in patients with dizziness or imbalance. Most physicians, however, continue to rely on caloric tests to guide their diagnoses. Our results support a significant shift in the approach used to determine diagnoses in patients with vestibular symptoms.
Cortical feature analysis and machine learning improves detection of “MRI-negative” focal cortical dysplasia
Bilal Ahmed, Carla E. Brodley, Karen E. Blackmon, Ruben Kuzniecky, Gilad Barash, Chad Carlson, Brian T. Quinn, Werner Doyle, Jacqueline French, Orrin Devinsky, Thomas Thesen , Cortical feature analysis and machine learning improves detection of “MRI-negative” focal cortical dysplasia, Science Direct, Epilepsy & Behavior, 2015
Focal cortical dysplasia (FCD) is the most common cause of pediatric epilepsy and the third most common lesion in adults with treatment-resistant epilepsy. Advances in MRI have revolutionized the diagnosis of FCD, resulting in higher success rates for resective epilepsy surgery. However, many patients with histologically confirmed FCD have normal presurgical MRI studies (‘MRI-negative’), making presurgical diagnosis difficult. The purpose of this study was to test whether a novel MRI postprocessing method successfully detects histopathologically verified FCD in a sample of patients without visually appreciable lesions. We applied an automated quantitative morphometry approach which computed five surface-based MRI features and combined them in a machine learning model to classify lesional and nonlesional vertices. Accuracy was defined by classifying contiguous vertices as “lesional” when they fell within the surgical resection region. Our multivariate method correctly detected the lesion in 6 of 7 MRI-positive patients, which is comparable with the detection rates that have been reported in univariate vertex-based morphometry studies. More significantly, in patients that were MRI-negative, machine learning correctly identified 14 out of 24 FCD lesions (58%). This was achieved after separating abnormal thickness and thinness into distinct classifiers, as well as separating sulcal and gyral regions. Results demonstrate that MRI-negative images contain sufficient information to aid in the in vivo detection of visually elusive FCD lesions.