Skip to main content

Ming-Chuan Wu

Part-Time Lecturer - Seattle


Office Location

Northeastern University - Seattle
401 Terry Ave N., Suite 103
225 Terry Ave N., Suite 300
Seattle, WA 98109

Mailing Address

Northeastern University
ATTN: Ming-Chuan Wu, 301 ME
360 Huntington Avenue
Boston, MA 02115


Currently working on the Machine Learning platform at Apple to power applied ML and ML research inside the company, and also as a guest-lecturer at Northeastern University (Seattle Campus) teaching the course large-scale parallel data processing in the cloud.

During my tenure at AWS, I led the design of an incubation project to deliver a new fully managed NoSQL database service. In addition, I delivered a working prototype that adds some of the key SQL features onto the flagship NoSQL database (DynamoDB) at AWS. Both will be announced in Re:invent 2018.

From 2008 to 2016 at MSFT, my main focus has been on building and supporting web-scale data processing platform-as-a-service (PaaS) which provides query-as-a-service. I also managed the query optimizer team during that time. The systems include
1. a web-scale batch data processing system, named Cosmos/SCOPE which powered MSFT’s Online Service business divisions and later evolved into U-SQL, part of MSFT’s public data data solution;
2. an MPP (Massively Parallel Processing) system as a service for interactive analytic processing;
3. a web-scale in-memory graph database as a service (an incubation project at Microsoft Research) for low-latency, high throughput OLTP workloads.

Before 2008, I have worked mainly on the query optimization in SQL Server product group. In a nutshell, the areas of my work include:
– Large scale in-memory graph database on top of distributed transactional memory
– Federated query processing for heterogeneous cloud-scale data sources
– Dynamic query optimization
– Cloud-scale distributed computation platform
– Query optimization for cloud-scale MapReduce environments
– Indexing strategies at MapReduce environment
– Large scale testing infrastructure with automatic scaling
– Workload analysis and physical data design for big data
– Query optimization for relational databases
– Cardinality estimation and costing
– Parallel database
– Bitmap indexing for OLAP and data warehouses


  • PhD in Computer Science, Technische Universität Darmstadt

About Me

  • Hometown: Seattle
  • Field of research/teaching: Database, Data Platform as Services, Parallel Data Processing

What are the specifics of your educational background?

PhD in Computer Science, Technische Universität Darmstadt, Germany.

What is your research focus in a bit more detail? Is your current research path what you always had in mind for yourself, or has it evolved somewhat? If so, how/why?

Currently focusing on Data Platform for ML; in the past I have worked on Query Optimization, Parallel Database, and Large-Scale Data Platform as Services.

What courses/subjects do you teach?

Parallel Data Processing

What are the specifics of your industry experience?

I have worked on SQL Server Query Optimizer for 7 years, followed by another 7 years at Bing building cloud-scale data processing platform. Later I spent one year at MICROSOFT Research building a scale-out in-memory graph database for OLTP workloads. After Microsoft, I spent two years at AWS designing distributed transaction for DynamoDB.