CS6140 13F: Homework 05

Assigned: Friday, November 8, 2013
Due: Wednesday, November 20, 2013

Last modified:


General Instructions

  1. Feel free to work with others on this assignment. However, you must acknowledge with whom you worked, you must write your own code, and you must create your own report.


Assignment

In this assignment, you will implement implement EM, the expectation-maximization algorithm, and study applications of this algorithm on both real and synthetic data sets.

Some resources:

  • EM class notes
  • EM Notes 1
  • EM Notes 2
  • EM Notes 3
  • EM Notes 4
  • EM Notes 5
  • Multivariate Normal
  • Multivariate Normal - Mariginal and Conditional Distributions
  • Correlation Coefficient
  • As always, you must write your own code to solve this problem.


    PROBLEM 1

    Mixtures of Gaussians in 2- and 3-dimensions.


    PROBLEM 2

    Naive Bayes classifier with a mixture of Gaussians applied to the Spam Data.

    Consider the Naive Bayes classifier you developed in HW02. There we modeled each 1-dimensional feature separately as a Bernoulli random variable, a four-bucket histogram, and a Gaussian. (We actually used two such distributions for each feature, one each for the positive and negative data.) In this problem, you will model each 1-dimensional feature using a mixture of k Gaussians.

    Use EM to estimate the 3k parameters for each feature

    mean1, var1, w1; mean2, var2, w2; ...; meank, vark, wk,

    where the weight vector is constrained by

    w1 +w2 + ... + wk = 1.

    You will need separate mixtures for the positive and negative data, for each feature. We observed best results for k = 9 in our experiments, though feel free to experiment with that parameter.

    Train and test your algorithm on Fold 1, computing an ROC curve and the overall AUC. Compare your results on Fold 1 with the results you obtained for that fold using Naive Bayes with just one Gaussian.


    Note: Two extra credit problems will be added shortly.