440 Huntington Avenue
310E West Village H
Boston, MA 02115
ATTN: Ehsan Elhamifar, 202 WVH
360 Huntington Avenue
Boston, MA 02115
- Machine Learning: subset selection, zero and few shot learning, manifold clustering, high-rank matrix completion, nonlinear dynamical models, deep neural networks
- Computer Vision: procedure learning, video summarization, large-scale multi-label recognition, motion and activity segmentation, active learning for visual data
- Optimization: sparse and low-rank recovery, structured submodular maximization, convex and non-convex optimization
- PhD in Electrical and Computer Engineering, Johns Hopkins University
- MS in Applied Mathematics and Statistics, Johns Hopkins University
- MS in Electrical Engineering, Sharif University of Technology
Ehsan Elhamifar is an Assistant Professor in the Khoury College of Computer Sciences and is the director of the Mathematical, Computational and Applied Data Science (MCADS) Lab at Northeastern University. He is affiliated with the Electrical and Computer Engineering Department at Northeastern. Prof. Elhamifar is a recipient of the DARPA Young Faculty Award and the NSF CISE Career Research Initiation Initiative Award. Previously, he was a postdoctoral scholar in the Electrical Engineering and Computer Science department at UC Berkeley. Prof. Elhamifar obtained his PhD from the Electrical and Computer Engineering department at the Johns Hopkins University.
Prof. Elhamifar’s research areas are machine learning, computer vision and optimization. He is interested in developing scalable, robust and provable algorithms that can address challenges of complex and massive high-dimensional data. He works on applications of these tools in computer vision and robotics among others. Specifically, he uses tools from convex, nonconvex and submodular optimization, sparse and low-rank modeling, deep learning, high-dimensional statistics and graphical models to develop algorithms and theory and applies them to solve real-world challenging problems, including big data summarization, procedure learning from instructional data, large-scale recognition with small labeled data and active learning for visual data.
Deep Supervised Summarization: Algorithm and Application to Learning Instructions C. Xu and E. Elhamifar, Neural Information Processing Systems (NeurIPS), 2019.
We address the problem of finding representative points of datasets by learning from multiple datasets and their ground-truth summaries. We develop a supervised subset selection framework, based on the facility location utility function, which learns to map datasets to their ground-truth representatives. To do so, we propose to learn representations of data so that the input of transformed data to the facility location recovers their ground-truth representatives. Given the NP-hardness of the utility function, we consider its convex relaxation based on sparse representation and investigate conditions under which the solution of the convex optimization recovers ground-truth representatives of each dataset. We design a loss function whose minimization over the parameters of the data representation network leads to satisfying the theoretical conditions, hence guaranteeing recovering groundtruth summaries. Given the non-convexity of the loss function, we develop an efficient learning scheme that alternates between representation learning by minimizing our proposed loss given the current assignments of points to ground-truth representatives and updating assignments given the current data representation. By experiments on the problem of learning key-steps (subactivities) of instructional videos, we show that our proposed framework improves the state-of-the-art supervised subset selection algorithms.
Unsupervised Procedure Learning via Joint Dynamic Summarization. E. Elhamifar and Z. Naing, International Conference on Computer Vision (ICCV), 2019.
We address the problem of unsupervised procedure learning from unconstrained instructional videos. Our goal is to produce a summary of the procedure key-steps and their ordering needed to perform a given task, as well as localization of the key-steps in videos. We develop a collaborative sequential subset selection framework, where we build a dynamic model on videos by learning states and transitions between them, where states correspond to different subactivities, including background and procedure steps. To extract procedure key-steps, we develop an optimization framework that finds a sequence of a small number of states that well represents all videos and is compatible with the state transition model. Given that our proposed optimization is non-convex and NP-hard, we develop a fast greedy algorithm whose complexity is linear in the length of the videos and the number of states of the dynamic model, hence, scales to large datasets. Under appropriate conditions on the transition model, our proposed formulation is approximately submodular, hence, comes with performance guarantees. We also present ProceL, a new multimodal dataset of 47.3 hours of videos and their transcripts from diverse tasks, for procedure learning evaluation. By extensive experiments, we show that our framework significantly improves the state of the art performance.
Facility Location: Approximate Submodularity and Greedy Algorithm, E. Elhamifar, International Conference on Machine Learning (ICML), 2019.
We develop and analyze a novel utility function and a fast optimization algorithm for subset selection in sequential data that incorporates the dynamic model of data. We propose a cardinality constrained sequential facility location function that finds a fixed number of representatives, where the sequence of representatives is compatible with the dynamic model and well encodes the data. As maximizing this new objective function is NPhard, we develop a fast greedy algorithm based on submodular maximization. Unlike the conventional facility location, the computation of the marginal gain in our case cannot be done by operations on each item independently. We exploit the sequential structure of the problem and develop an efficient dynamic programming-based algorithm that computes the marginal gain exactly. We investigate conditions on the dynamic model, under which our utility function is (ε-approximately) submodular, hence, the greedy algorithm comes with performance guarantees. By experiments on synthetic data and the problem of procedure learning from instructional videos, we show that our framework significantly improves the computational time, achieves better objective function values and obtains more coherent summaries.
Y. Wang and E. Elhamifar AAAI Conference on Artificial Intelligence (AAAI), 2018.
We address the problem of high-rank matrix completion with side information. In contrast to existing work dealing with side information, which assume that the data matrix is low-rank, we consider the more general scenario where the columns of the data matrix are drawn from a union of lowdimensional subspaces, which can lead to a high rank matrix. Our goal is to complete the matrix while taking advantage of the side information. To do so, we use the self-expressive property of the data, searching for a sparse representation of each column of matrix as a combination of a few other columns. More specifically, we propose a factorization of the data matrix as the product of side information matrices with an unknown interaction matrix, under which each column of the data matrix can be reconstructed using a sparse combination of other columns. As our proposed optimization, searching for missing entries and sparse coefficients, is non-convex and NP-hard, we propose a lifting framework, where we couple sparse coefficients and missing values and define an equivalent optimization that is amenable to convex relaxation. We also propose a fast implementation of our convex framework using a Linearized Alternating Direction Method. By extensive experiments on both synthetic and real data, and, in particular, by studying the problem of multi-label learning, we demonstrate that our method outperforms existing techniques in both low-rank and high-rank data regimes.
E. Elhamifar and M. C. De Paolis Kaluza; Neural Information Processing Systems (NIPS), 2017.
Subset selection, which is the task of finding a small subset of representative items from a large ground set, finds numerous applications in different areas. Sequential data, including time-series and ordered data, contain important structural relationships among items, imposed by underlying dynamic models of data, that should play a vital role in the selection of representatives. However, nearly all existing subset selection techniques ignore underlying dynamics of data and treat items independently, leading to incompatible sets of representatives. In this paper, we develop a new framework for sequential subset selection that finds a set of representatives compatible with the dynamic models of data. To do so, we equip items with transition dynamic models and pose the problem as an integer binary optimization over assignments of sequential items to representatives, that leads to high encoding, diversity and transition potentials. Our formulation generalizes the well-known facility location objective to deal with sequential data, incorporating transition
dynamics among facilities. As the proposed formulation is non-convex, we derive a max-sum message passing algorithm to solve the problem efficiently. Experiments on synthetic and real data, including instructional video summarization, show that our sequential subset selection framework not only achieves better encoding and diversity than the state of the art, but also successfully incorporates dynamics of data, leading to compatible representatives.
E. Elhamifar and M. C. De Paolis Kaluza IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
We consider the problem of subset selection in the online setting, where data arrive incrementally. Instead of storing and running subset selection on the entire dataset, we propose an incremental subset selection framework that, at each time instant, uses the previously selected set of representatives and the new batch of data in order to update the set of representatives. We cast the problem as an integer binary optimization minimizing the encoding cost of the data via representatives regularized by the number of selected items. As the proposed optimization is, in general, NP-hard and non-convex, we study a greedy approach based on unconstrained submodular optimization and also propose an efficient convex relaxation. We show that, under appropriate conditions, the solution of our proposed convex algorithm achieves the global optimal solution of the non-convex problem. Our results also address the conventional problem of subset selection in the offline setting, as a special case. By extensive experiments on the problem of video summarization,
we demonstrate that our proposed online subset selection algorithms perform well on real data, capturing diverse representative events in videos, while they obtain objective function values close to the offline setting.
Dissimilarity-based Sparse Subset Selection, E. Elhamifar, G. Sapiro and S. Sastry, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2016.
Finding an informative subset of a large number of data points or models is at the center of many problems in machine learning, computer vision, bio/health informatics and image/signal processing. Given pairwise dissimilarities between the elements of a `source set’ and a `target set,’ we consider the problem of finding a subset of the source set, called representatives or exemplars, that can efficiently describe the target set. We formulate the problem as a row-sparsity regularized trace minimization problem. Since the proposed formulation is, in general, an NP-hard problem, we consider a convex relaxation. The solution of our proposed optimization program finds the representatives and the probability that each element of the target set is associated with the representatives. We analyze the solution of our proposed optimization as a function of the regularization parameter. We show that when the two sets jointly partition into multiple groups, the solution of our proposed optimization program finds representatives from all groups and reveals clustering of the sets. In addition, we show that our proposed formulation can effectively deal with outliers. Our algorithm works with arbitrary dissimilarities, which can be asymmetric or violate the triangle inequality. To efficiently implement our proposed algorithm, we consider an Alternating Direction Method of Multipliers (ADMM) framework, which results in quadratic complexity in the problem size. We show that the ADMM implementation allows to parallelize the algorithm, hence further reducing the computational cost. Finally, by experiments on real-world datasets, we show that our proposed algorithm improves the state of the art on the two problems of scene categorization using representative images and time-series modeling and segmentation using representative models.
Energy Disaggregation via Learning ‘Powerlets’ and Sparse Coding, E. Elhamifar and S. Sastry, AAAI Conference on Artificial Intelligence (AAAI), 2015.
In this paper, we consider the problem of energy disaggregation, i.e., decomposing a whole home electricity signal into its component appliances. We propose a new supervised algorithm, which in the learning stage, automatically extracts signature consumption patterns of each device by modeling the device as a mixture of dynamical systems. In order to extract signature consumption patterns of a device corresponding to its different modes of operation, we define appropriate dissimilarities between energy snippets of the device and use them in a subset selection scheme, which we generalize to deal with time-series data. We then form a dictionary that consists of extracted power signatures across all devices. We cast the disaggregation problem as an optimization over a representation in the learned dictionary and incorporate several novel priors such as device-sparsity, knowledge about devices that do or do not work together as well as temporal consistency of the disaggregated solution. Real experiments on a publicly available energy dataset demonstrate that our proposed algorithm achieves promising results for energy disaggregation.
Robust Subspace Clustering, M. Soltanolkotabi, E. Elhamifar and E. J. Candes, Annals of Statistics, 2014.
Subspace clustering refers to the task of finding a multi-subspace representation that best fits a collection of points taken from a high-dimensional space. This paper introduces an algorithm inspired by sparse subspace clustering (SSC)  to cluster noisy data, and develops some novel theory demonstrating its correctness. In particular, the theory uses ideas from geometric functional analysis to show that the algorithm can accurately recover the underlying subspaces under minimal requirements on their orientation, and on the number of samples per subspace. Synthetic as well as real data experiments complement our theoretical study, illustrating our approach and demonstrating its effectiveness.
Sparse Subspace Clustering: Algorithm, Theory, and Applications, E. Elhamifar and R. Vidal, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2013.
In many real-world problems, we are dealing with collections of high-dimensional data, such as images, videos, text and web documents, DNA microarray data, and more. Often, high-dimensional data lie close to low-dimensional structures corresponding to several classes or categories the data belongs to. In this paper, we propose and study an algorithm, called Sparse Subspace Clustering (SSC), to cluster data points that lie in a union of low-dimensional subspaces. The key idea is that, among infinitely many possible representations of a data point in terms of other points, a sparse representation corresponds to selecting a few points from the same subspace. This motivates solving a sparse optimization program whose solution is used in a spectral clustering framework to infer the clustering of data into subspaces. Since solving the sparse optimization program is in general NP-hard, we consider a convex relaxation and show that, under appropriate conditions on the arrangement of subspaces and the distribution of data, the proposed minimization program succeeds in recovering the desired sparse representations. The proposed algorithm can be solved efficiently and can handle data points near the intersections of subspaces. Another key advantage of the proposed algorithm with respect to the state of the art is that it can deal with data nuisances, such as noise, sparse outlying entries, and missing entries, directly by incorporating the model of the data into the sparse optimization program. We demonstrate the effectiveness of the proposed algorithm through experiments on synthetic data as well as the two real-world problems of motion segmentation and face clustering.
A Convex Optimization Framework for Active Learning, E. Elhamifar, G. Sapiro, A. Yang and S. Sastry, International Conference on Computer Vision (ICCV), 2013.
In many image/video/web classification problems, we have access to a large number of unlabeled samples. However, it is typically expensive and time consuming to obtain labels for the samples. Active learning is the problem of progressively selecting and annotating the most informative unlabeled samples, in order to obtain a high classi- fication performance. Most existing active learning algorithms select only one sample at a time prior to retraining the classifier. Hence, they are computationally expensive and cannot take advantage of parallel labeling systems such as Mechanical Turk. On the other hand, algorithms that allow the selection of multiple samples prior to retraining the classifier, may select samples that have significant information overlap or they involve solving a non-convex optimization. More importantly, the majority of active learning algorithms are developed for a certain classifier type such as SVM. In this paper, we develop an efficient active learning framework based on convex programming, which can select multiple samples at a time for annotation. Unlike the state of the art, our algorithm can be used in conjunction with any type of classifiers, including those of the family of the recently proposed Sparse Representation-based Classification (SRC). We use the two principles of classi- fier uncertainty and sample diversity in order to guide the optimization program towards selecting the most informative unlabeled samples, which have the least information overlap. Our method can incorporate the data distribution in the selection process by using the appropriate dissimilarity between pairs of samples. We show the effectiveness of our framework in person detection, scene categorization and face recognition on real-world datasets.
Block-Sparse Recovery via Convex Optimization, E. Elhamifar and R. Vidal, IEEE Transactions on Signal Processing (TSP), 2012.
Given a dictionary that consists of multiple blocks and a signal that lives in the range space of only a few blocks, we study the problem of finding a block-sparse representation of the signal, i.e., a representation that uses the minimum number of blocks. Motivated by signal/image processing and computer vision applications, such as face recognition, we consider the block-sparse recovery problem in the case where the number of atoms in each block is arbitrary, possibly much larger than the dimension of the underlying subspace. To find a block-sparse representation of a signal, we propose two classes of non-convex optimization programs, which aim to minimize the number of nonzero coefficient blocks and the number of nonzero reconstructed vectors from the blocks, respectively. Since both classes of problems are NP-hard, we propose convex relaxations and derive conditions under which each class of the convex programs is equivalent to the original non-convex formulation. Our conditions depend on the notions of mutual and cumulative subspace coherence of a dictionary, which are natural generalizations of existing notions of mutual and cumulative coherence. We evaluate the performance of the proposed convex programs through simulations as well as real experiments on face recognition. We show that treating the face recognition problem as a block-sparse recovery problem improves the state-of-the-art results by 10% with only 25% of the training data.
Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery, E. Elhamifar, G. Sapiro and R. Vidal, Neural Information Processing Systems (NIPS), 2012.
Given pairwise dissimilarities between data points, we consider the problem of finding a subset of data points, called representatives or exemplars, that can efficiently describe the data collection. We formulate the problem as a row-sparsity regularized trace minimization problem that can be solved efficiently using convex programming. The solution of the proposed optimization program finds the representatives and the probability that each data point is associated with each one of the representatives. We obtain the range of the regularization parameter for which the solution of the proposed optimization program changes from selecting one representative for all data points to selecting all data points as representatives. When data points are distributed around multiple clusters according to the dissimilarities, we show that the data points in each cluster select representatives only from that cluster. Unlike metric-based methods, our algorithm can be applied to dissimilarities that are asymmetric or violate the triangle inequality, i.e., it does not require that the pairwise dissimilarities come from a metric. We demonstrate the effectiveness of the proposed algorithm on synthetic data as well as real-world image and text data.
See All by Looking at A Few: Sparse Modeling for Finding Representative Objects, E. Elhamifar, G. Sapiro and R. Vidal, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012.
We consider the problem of finding a few representatives for a dataset, i.e., a subset of data points that efficiently describes the entire dataset. We assume that each data point can be expressed as a linear combination of the representatives and formulate the problem of finding the representatives as a sparse multiple measurement vector problem. In our formulation, both the dictionary and the measurements are given by the data matrix, and the unknown sparse codes select the representatives via convex optimization. In general, we do not assume that the data are lowrank or distributed around cluster centers. When the data do come from a collection of low-rank models, we show that our method automatically selects a few representatives from each low-rank model. We also analyze the geometry of the representatives and discuss their relationship to the vertices of the convex hull of the data. We show that our framework can be extended to detect and reject outliers in datasets, and to efficiently deal with new observations and large datasets. The proposed framework and theoretical foundations are illustrated with examples in video summarization and image classification using representatives.
Sparse Hidden Markov Models for Surgical Gesture Classification and Skill Evaluation, L. Tao, E. Elhamifar, S. Khudanpur, G. Hager, and R. Vidal, Information Processing in Computer Assisted Interventions (IPCAI), 2012.
We consider the problem of classifying surgical gestures and skill level in robotic surgical tasks. Prior work in this area models gestures as states of a hidden Markov model (HMM) whose observations are discrete, Gaussian or factor analyzed. While successful, these approaches are limited in expressive power due to the use of discrete or Gaussian observations. In this paper, we propose a new model called sparse HMMs whose observations are sparse linear combinations of elements from a dictionary of basic surgical motions. Given motion data from many surgeons with different skill levels, we propose an algorithm for learning a dictionary for each gesture together with an HMM grammar describing the transitions among different gestures. We then use these dictionaries and the grammar to represent and classify new motion data. Experiments on a database of surgical motions acquired with the da Vinci system show that our method performs on par with or better than state-of-the-art methods.This suggests that learning a grammar based on sparse motion dictionaries is important in gesture and skill classification.
Sparse Manifold Clustering and Embedding, E. Elhamifar and R. Vidal, Neural Information Processing Systems (NIPS), 2011.
We propose an algorithm called Sparse Manifold Clustering and Embedding (SMCE) for simultaneous clustering and dimensionality reduction of data lying in multiple nonlinear manifolds. Similar to most dimensionality reduction methods, SMCE finds a small neighborhood around each data point and connects each point to its neighbors with appropriate weights. The key difference is that SMCE finds both the neighbors and the weights automatically. This is done by solving a sparse optimization problem, which encourages selecting nearby points that lie in the same manifold and approximately span a low-dimensional affine subspace. The optimal solution encodes information that can be used for clustering and dimensionality reduction using spectral clustering and embedding. Moreover, the size of the optimal neighborhood of a data point, which can be different for different points, provides an estimate of the dimension of the manifold to which the point belongs. Experiments demonstrate that our method can effectively handle multiple manifolds that are very close to each other, manifolds with non-uniform sampling and holes, as well as estimate the intrinsic dimensions of the manifolds.
Robust Classification using Structured Sparse Representation, E. Elhamifar and R. Vidal, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
In many problems in computer vision, data in multiple classes lie in multiple low-dimensional subspaces of a highdimensional ambient space. However, most of the existing classification methods do not explicitly take this structure into account. In this paper, we consider the problem of classification in the multi-subspace setting using sparse representation techniques. We exploit the fact that the dictionary of all the training data has a block structure where the training data in each class form few blocks of the dictionary. We cast the classification as a structured sparse recovery problem where our goal is to find a representation of a test example that uses the minimum number of blocks from the dictionary. We formulate this problem using two different classes of non-convex optimization programs. We propose convex relaxations for these two non-convex programs and study conditions under which the relaxations are equivalent to the original problems. In addition, we show that the proposed optimization programs can be modified properly to also deal with corrupted data. To evaluate the proposed algorithms, we consider the problem of automatic face recognition. We show that casting the face recognition problem as a structured sparse recovery problem can improve the results of the state-of-the-art face recognition algorithms, especially when we have relatively small number of training data for each class. In particular, we show that the new class of convex programs can improve the state-ofthe-art face recognition results by 10% with only 25% of the training data. In addition, we show that the algorithms are robust to occlusion, corruption, and disguise.
Sparse Subspace Clustering, E. Elhamifar and R. Vidal, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.
We propose a method based on sparse representation (SR) to cluster data drawn from multiple low-dimensional linear or affine subspaces embedded in a high-dimensional space. Our method is based on the fact that each point in a union of subspaces has a SR with respect to a dictionary formed by all other data points. In general, finding such a SR is NP hard. Our key contribution is to show that, under mild assumptions, the SR can be obtained ’exactly’ by using !1 optimization. The segmentation of the data is obtained by applying spectral clustering to a similarity matrix built from this SR. Our method can handle noise, outliers as well as missing data. We apply our subspace clustering algorithm to the problem of segmenting multiple motions in video. Experiments on 167 video sequences show that our approach significantly outperforms state-of-the-art methods.