CS 6240: Large-Scale Parallel Data Processing
- Interested in CS 6240, but you do not meet the pre-reqs?
Please read this FAQ.
- Graduate course. Covers big-data analysis techniques that scale out with
increasing number of compute nodes, e.g., for cloud computing. Focuses on
approaches for problem and data partitioning that distribute work
effectively while keeping total cost for computation and data transfer low.
Deterministic and random algorithms from a variety of domains, including
graphs, data mining, linear algebra, and information retrieval, are studied
and analyzed in terms of their cost, scalability, and robustness against
skew. Coursework emphasizes hands-on programming experience with modern
state-of-the-art big-data processing technology. Students who do not meet
course prerequisites may seek permission of instructor.
CS 3800: Theory of Computation
(this course is managed completely on Canvas)
division undergraduate course. Introduces the theory behind computers and computing aimed
at answering the question, “What are the capabilities and limitations of
computers?” Covers automata theory, computability, and complexity. The
automata theory portion includes finite automata, regular expressions, nondeterminism, nonregular languages, context-free languages, pushdown
automata, and noncontext-free languages. The computability portion includes
Turing machines, the Church-Turing thesis, decidable languages, and the
Halting theorem. The complexity portion includes big-O and small-o notation,
the classes P and NP, the P vs. NP question, and NP-completeness.
CS 7240/7280: Principles of scalable data management: theory,
algorithms and database systems
- This course provides a rigorous introduction to the
algorithms, core principles, and foundational concepts for managing data at
scale. The emphasis is on both, the high-level theoretical intuitions and
principles underlying scalable data management, as well as technical
details. Topics include data models and query languages, query optimization,
complexity of big-data analysis, data-stream processing, parallel data
processing, and probabilistic data management. Students will gain deep
algorithmic understanding through interactive classes and a project with
regular feedback. The latter will be flexible, allowing students to explore
scalable data management and analysis aspects related to their PhD research.
CS 7290: Special Topics in Data Science: Foundations in
Scalable Data Management
- This course explores research topics in analysis and
management of large data, with a focus on distributed and parallel
approaches, join processing, and imprecise data/approximation. We will
discuss and analyze papers covering applications, algorithms, systems, and
theory--with a focus on the most recent developments. This course is
designed for PhD students, as well as advanced Masters students with a solid
background in algorithms and one or more data-oriented areas of computer
science, incl. machine learning, AI, logics, information retrieval, and
security. A desired outcome of the course project is the creation of
research results that are publishable in a peer-reviewed conference.
CS 6240: Parallel Data Processing in MapReduce
- Graduate course. This course covers techniques for
analyzing very large data sets. We introduce the MapReduce programming model
and the core technologies it relies on in practice, such as a distributed
file system. Related approaches and technologies from distributed databases
and Cloud Computing will also be introduced. Particular emphasis is placed
on practical examples and hands-on programming experience. Both plain
MapReduce and database-inspired advanced programming models running on top
of a MapReduce infrastructure will be used.
CS 6220: Data Mining Techniques
- Graduate course. This course covers various aspects of data mining including data
preprocessing, classification, ensemble methods, association rule mining, sequence
mining, and cluster analysis. The class project involves hands-on practice
of mining useful knowledge from a large database.
CS 3200: Database Design
- Upper division undergraduate course. This course studies the design of relational databases, including the
entity-relationship model, normalization, relational algebra, SQL, triggers,
stored procedures, indexing, elementary query optimization, and fundamentals
of concurrency and recovery. The class project involves working with a
commercial relational database management system and accessing it from an
CSG 339: Scalable Techniques for Massive Data
- Graduate course. We discuss influential and cutting edge research papers from academia
and industry research groups. The course also has a project requirement
where students can choose a research project related to large-scale data