805 Columbus Avenue
522 Interdisciplinary Science and Engineering Complex (ISEC)
Boston, MA 02120
ATTN: Christopher Amato, 435 ISEC
360 Huntington Avenue
Boston, MA 02115
- Artificial Intelligence
- Machine Learning
- PhD, University of Massachusetts – Amherst
- MS, University of Massachusetts – Amherst
- BA, Tufts University
Christopher Amato is an Assistant Professor at Northeastern University. He received a BA from Tufts University and an MS and a PhD from the University of Massachusetts, Amherst. Before joining Northeastern, Dr. Amato was a Research Scientist at Aptima, Inc. and a Postdoc and Research Scientist at MIT as well as an Assistant Professor at the University of New Hampshire. He has published papers in leading artificial intelligence and robotics conferences (including winning a best paper prize at AAMAS-14 and being nominated for the best paper at RSS-15). He also successfully co-organized several tutorials on team decision making and co-authored a book on the same subject. His research focuses on decision making under uncertainty in multi-agent and multi-robot systems.
Yuchen Xiao, Sammie Katt, Andreas ten Pas, Shengjian Chen and Christopher Amato. In the Proceedings of the 2019 IEEE International Conference on Robotics and Automation (ICRA-19), May 2019.
The problem of finding and grasping a target object in a cluttered, uncertain environment, target object search, is a common and important problem in robotics. One key challenge is the uncertainty of locating and recognizing each object in a cluttered environment due to noisy perception and occlusions. Furthermore, the uncertainty in localization makes manipulation difficult and uncertain. To cope with these challenges, we formulate the target object search task as a partially observable Markov decision process (POMDP), enabling the robot to reason about perceptual and manipulation uncertainty while searching. To further address the manipulation difficulty, we propose Parameterized Action Partially Observable MonteCarlo Planning (PA-POMCP), an algorithm that evaluates manipulation actions by taking into account the effect of the robot’s current belief on the success of the action execution. In addition, a novel run-time initial belief generator and a state value estimator are introduced in this paper to facilitate the PAPOMCP algorithm. Our experiments show that our methods solve the target object search task in settings where simpler methods either take more object movements or fail.
Sammie Katt, Frans A. Oliehoek and Christopher Amato. In the Proceedings of the Eighteenth International Conference on Autonomous Agents and Multi-Agent System (AAMAS-19), May 2019.
Model-based Bayesian Reinforcement Learning (BRL) provides a principled solution to dealing with the exploration-exploitation trade-off, but such methods typically assume a fully observable environments. The few Bayesian RL methods that are applicable in partially observable domains, such as the Bayes-Adaptive POMDP (BA-POMDP), scale poorly. To address this issue, we introduce the Factored BA-POMDP model (FBA-POMDP), a framework that is able to learn a compact model of the dynamics by exploiting the underlying structure of a POMDP. The FBA-POMDP framework casts the problem as a planning task, for which we adapt the Monte-Carlo Tree Search planning algorithm and develop a belief tracking method to approximate the joint posterior over the state and model variables. Our empirical results show that this method outperforms a number of BRL baselines and is able to learn efficiently when the factorization is known, as well as learn both the factorization and the model parameters simultaneously.
Christopher Amato, George Konidaris, Jonathan P. How and Leslie P. Kaelbling. In the Journal of Artificial Intelligence Research (JAIR), vol. 64: pages 817-859, March, 2019.
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent’s actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.
Shayegan Omidshafiei, Dong-Ki Kim, Miao Liu, Gerald Tesauro, Matthew Riemer, Christopher Amato, Murray Campbell and Jonathan How. In the Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), February 2019.
Collective human knowledge has clearly benefited from the fact that innovations by individuals are taught to others through communication. Similar to human social groups, agents in distributed learning systems would likely benefit from communication to share knowledge and teach skills. The problem of teaching to improve agent learning has been investigated by prior works, but these approaches make assumptions that prevent application of teaching to general multiagent problems, or require domain expertise for problems they can apply to. This learning to teach problem has inherent complexities related to measuring long-term impacts of teaching that compound the standard multiagent coordination challenges. In contrast to existing works, this paper presents the first general framework and algorithm for intelligent agents to learn to teach in a multiagent environment. Our algorithm, Learning to Coordinate and Teach Reinforcement (LeCTR), addresses peer-to-peer teaching in cooperative multiagent reinforcement learning. Each agent in our approach learns both when and what to advise, then uses the received advice to improve local learning. Importantly, these roles are not fixed; these agents learn to assume the role of student and/or teacher at the appropriate moments, requesting and providing advice in order to improve teamwide performance and learning. Empirical comparisons against state-of-the-art teaching methods show that our teaching agents not only learn significantly faster, but also learn to coordinate in tasks where existing methods fail.
Christopher Amato. In the Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI-18), July 2018
Multi-agent planning and learning methods are becoming increasingly important in today’s interconnected world. Methods for real-world domains, such as robotics, must consider uncertainty and limited communication in order to generate high-quality, robust solutions. This paper discusses our work on developing principled models to represent these problems and planning and learning methods that can scale to realistic multi-agent and multi-robot tasks.
Nghia Hoang, Yuchen Xiao, Kavinayan Sivakumar, Christopher Amato and Jonathan P. How. In the Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA-18), May 2018.
A key challenge in multi-robot and multi-agent systems is generating solutions that are robust to other self-interested or even adversarial parties who actively try to prevent the agents from achieving their goals. The practicality of existing works addressing this challenge is limited to only small-scale synchronous decision-making scenarios or a single agent planning its best response against a single adversary with fixed, procedurally characterized strategies. In contrast this paper considers a more realistic class of problems where a team of asynchronous agents with limited observation and communication capabilities need to compete against multiple strategic adversaries with changing strategies. This problem necessitates agents that can coordinate to detect changes in adversary strategies and plan the best response accordingly. Our approach first optimizes a set of stratagems that represent these best responses. These optimized stratagems are then integrated into a unified policy that can detect and respond when the adversaries change their strategies. The near-optimality of the proposed framework is established theoretically as well as demonstrated empirically in simulation and hardware.
The Art of Drafting: A Team-Oriented Hero Recommendation System for Multiplayer Online Battle Arena Games
Zhengxing Chen, Truong-Huy D. Nguyen, Yuyu Xu, Christopher Amato, Seth Cooper, Yizhou Sun and Magy Seif El-Nasr. In the Proceedings of the ACM Conference on Recommender Systems (Recsys-18), October 2018
Multiplayer Online Battle Arena (MOBA) games have received increasing popularity recently. In a match of such games, players compete in two teams of five, each controlling an in-game avatars, known as heroes, selected from a roster of more than 100. The selection of heroes, also known as pick or draft, takes place before the match starts and alternates between the two teams until each player has selected one hero. Heroes are designed with different strengths and weaknesses to promote team cooperation in a game. Intuitively, heroes in a strong team should complement each other’s strengths and suppressing those of opponents. Hero drafting is therefore a challenging problem due to the complex hero-to-hero relationships to consider. In this paper, we propose a novel hero recommendation system that suggests heroes to add to an existing team while maximizing the team’s prospect for victory. To that end, we model the drafting between two teams as a combinatorial game and use Monte Carlo Tree Search (MCTS) for estimating the values of hero combinations. Our empirical evaluation shows that hero teams drafted by our recommendation algorithm have significantly higher win rate against teams constructed by other baseline and state-of-the-art strategies.
Zhengxing Chen, Christopher Amato, Truong-Huy D. Nguyen, Seth Cooper, Yizhou Sun and Magy Seif El-Nasr. In the Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG-18), August 2018.
Deck building is a crucial component in playing Collectible Card Games (CCGs). The goal of deck building is to choose a fixed-sized subset of cards from a large card pool, so that they work well together in-game against specific opponents. Existing methods either lack flexibility to adapt to different opponents or require large computational resources, still making them unsuitable for any real-time or large-scale application. We propose a new deck recommendation system, named Q-DeckRec, which learns a deck search policy during a training phase and uses it to solve deck building problem instances. Our experimental results demonstrate Q-DeckRec requires less computational resources to build winning-effective decks after a training phase compared to several baseline methods.
COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs. Madison Clark-Turner and Christopher Amato. In the Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), August 2017
The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current DecPOMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.
Learning in POMDPs with Monte Carlo Tree Search. Sammie Katt, Frans A. Oliehoek and Christopher Amato. In the Proceedings of the Thirty-Fourth International Conference on Machine Learning (ICML-17), August 2017.
The POMDP is a powerful framework for reasoning under outcome and information uncertainty, but constructing an accurate POMDP model is difficult. Bayes-Adaptive Partially Observable Markov Decision Processes (BA-POMDPs) extend POMDPs to allow the model to be learned during execution. BA-POMDPs are a Bayesian RL approach that, in principle, allows for an optimal trade-off between exploitation and exploration. Unfortunately, BA-POMDPs are currently impractical to solve for any non-trivial domain. In this paper, we extend the Monte-Carlo Tree Search method POMCP to BA-POMDPs and show that the resulting method, which we call BA-POMCP, is able to tackle problems that previous solution methods have been unable to solve. Additionally, we introduce several techniques that exploit the BA-POMDP structure to improve the efficiency of BA-POMCP along with proof of their convergence.
Policy Search for Multi-Robot Coordination under Uncertainty. Christopher Amato, George Konidaris, Ariel Anders, Gabriel Cruz, Jonathan P. How and Leslie P. Kaelbling. In the International Journal of Robotics Research (IJRR), vol. 35, issue 14, 2017.
We introduce a principled method for multi-robot coordination based on a general model termed a MacDec-POMDP of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing, and communication limitations. A new MacDec-POMDP planning algorithm is presented that searches over policies represented as finite-state controllers, rather than the previous policy tree representation. Finite-state controllers can be much more concise than trees, are much easier to interpret, and can operate over an infinite horizon. The resulting policy search algorithm requires a substantially simpler simulator that models only the outcomes of executing a given set of motor controllers, not the details of the executions themselves and can solve significantly larger problems than existing MacDec-POMDP planners. We demonstrate significant performance improvements over previous methods and show that our method can be used for actual multi-robot systems through experiments on a cooperative multi-robot bartending domain.