Northeastern University - Seattle
401 Terry Ave N., Suite 103
225 Terry Ave N., Suite 300
Seattle, WA 98109
ATTN: Ming-Chuan Wu, 301 ME
360 Huntington Avenue
Boston, MA 02115
- Data platforms for machine learning
- Query optimization
- Parallel databases
- Large-scale data platform as services.
- PhD in Computer Science, Technische Universität Darmstadt, Germany
Ming-Chuan Wu is a part-time lecturer in the Khoury College of Computer Sciences at Northeastern University’s Seattle campus, where he teaches large-scale parallel data processing in the cloud. Apart from teaching, he works on the machine learning platform at Apple to power applied machine learning and related research within the company.
During his tenure at Amazon Web Services, Wu led the design of an incubation project to deliver a new fully managed NoSQL database service. He also delivered a working prototype that adds key SQL features onto the flagship NoSQL database (DynamoDB) which was announced at Re:Invent 2018. Wu spent eight years at Microsoft where he focused on building and supporting web-scale data processing platform-as-a-service (PaaS) that provided query-as-a-service. Until 2016, he managed the query optimizer team, with a focus on web-scale batch data processing systems (Cosmos/SCOPE), massively parallel processing (MPP) systems, and a web-scale in-memory graph database.
Where is your hometown?
What courses/subjects do you teach?
- CS 6240: Large-Scale Parallel Data Processing
What are the specifics of your industry experience?
I have worked on SQL Server Query Optimizer for 7 years, followed by another 7 years at Bing building cloud-scale data processing platform. Later I spent one year at Microsoft Research building a scale-out in-memory graph database for OLTP workloads. After Microsoft, I spent two years at AWS designing distributed transactions for DynamoDB.
Before 2008, I have worked mainly on the query optimization in SQL Server product group. In a nutshell, the areas of my work include:
– Large scale in-memory graph database on top of distributed transactional memory
– Federated query processing for heterogeneous cloud-scale data sources
– Dynamic query optimization
– Cloud-scale distributed computation platform
– Query optimization for cloud-scale MapReduce environments
– Indexing strategies at MapReduce environment
– Large scale testing infrastructure with automatic scaling
– Workload analysis and physical data design for big data
– Query optimization for relational databases
– Cardinality estimation and costing
– Parallel database
– Bitmap indexing for OLAP and data warehouses