Skip to main content

Ming-Chuan Wu

Part-Time Lecturer - Seattle


Office Location

Northeastern University - Seattle
401 Terry Ave N., Suite 103
225 Terry Ave N., Suite 300
Seattle, WA 98109

Mailing Address

Northeastern University
ATTN: Ming-Chuan Wu, 301 ME
360 Huntington Avenue
Boston, MA 02115


  • PhD in Computer Science, Technische Universität Darmstadt


Currently working on the Machine Learning platform at Apple to power applied ML and ML research inside the company, and also as a guest-lecturer at Northeastern University (Seattle Campus) teaching the course large-scale parallel data processing in the cloud.

During my tenure at AWS, I led the design of an incubation project to deliver a new fully managed NoSQL database service. In addition, I delivered a working prototype that adds some of the key SQL features onto the flagship NoSQL database (DynamoDB) at AWS. Both will be announced in Re:invent 2018.

From 2008 to 2016 at MSFT, my main focus has been on building and supporting web-scale data processing platform-as-a-service (PaaS) which provides query-as-a-service. I also managed the query optimizer team during that time. The systems include
1. a web-scale batch data processing system, named Cosmos/SCOPE which powered MSFT’s Online Service business divisions and later evolved into U-SQL, part of MSFT’s public data data solution;
2. an MPP (Massively Parallel Processing) system as a service for interactive analytic processing;
3. a web-scale in-memory graph database as a service (an incubation project at Microsoft Research) for low-latency, high throughput OLTP workloads.

Before 2008, I have worked mainly on the query optimization in SQL Server product group. In a nutshell, the areas of my work include:
– Large scale in-memory graph database on top of distributed transactional memory
– Federated query processing for heterogeneous cloud-scale data sources
– Dynamic query optimization
– Cloud-scale distributed computation platform
– Query optimization for cloud-scale MapReduce environments
– Indexing strategies at MapReduce environment
– Large scale testing infrastructure with automatic scaling
– Workload analysis and physical data design for big data
– Query optimization for relational databases
– Cardinality estimation and costing
– Parallel database
– Bitmap indexing for OLAP and data warehouses

About Me

  • Hometown: Seattle
  • Field of research/teaching: Database, Data Platform as Services, Parallel Data Processing

What are the specifics of your educational background?

PhD in Computer Science, Technische Universität Darmstadt, Germany.

What is your research focus in a bit more detail? Is your current research path what you always had in mind for yourself, or has it evolved somewhat? If so, how/why?

Currently focusing on Data Platform for ML; in the past I have worked on Query Optimization, Parallel Database, and Large-Scale Data Platform as Services.

What courses/subjects do you teach?

Parallel Data Processing

What are the specifics of your industry experience?

I have worked on SQL Server Query Optimizer for 7 years, followed by another 7 years at Bing building cloud-scale data processing platform. Later I spent one year at MICROSOFT Research building a scale-out in-memory graph database for OLTP workloads. After Microsoft, I spent two years at AWS designing distributed transaction for DynamoDB.