CS6140 13F: Homework 04

Assigned: Wednesday, October 16, 2013
Due: Friday, October 25, 2013

Last modified:

General Instructions

Feel free to work with others on this assignment. However, you must acknowledge with whom you worked, you must write your own code, and you must create your own report.

Assignment

In this assignment, you will implement boosting with weak learners based on decision stumps, testing your algorithms on the Spambase data set.

Boosting
Implement the boosting algorithm as described in class. Note that the specification of boosting provides a clean interface between boosting (the meta-learner) and the underlying weak learning algorithm: in each round, boosting provides a weighted data set to the weak learner, and the weak learner provides a predictor in return. You may choose to keep this clean interface (which would allow you to run boosting over most any weak learner) or you may choose to more simply incorporate the weak learning algorithm inside your boosting code.
Decision Stumps
Each predictor will correspond to a decision stump, which is just a feature-threshold pair. Note that for each feature f_i, you may have many possible thresholds which we shall denote t_i,j.
Given an input instance to classify, a decision stump corresponding to feature f_i and threshold t_i,j will predict +1 if the input instance has a feature f_i value exceeding the threshold t_i,j; otherwise, it predicts -1.
To create the various thresholds for each feature f_i, you should
- sort the traiing examples by their f_i values
- remove duplicate values, and
- construct thesholds that are midway between successive feature values.
You should also add two thresholds for each feature: one below all values for that feature and one above all values for that feature.
Note that by removing duplicate values, you will have fewer thresholds than examples for any given feature, and possible far fewer.
Weak Learning via "Optimal" Decision Stumps
Create a weak learner that returns the "best" decision stump with respect to the weighted training set given. Here, the "best" decision stump h is the one whose error is as far from 1/2 as possible; in other words, your goal is to maximize
|1/2 - error(h)|. Thus, a decision stump whose error is 0.9 will be favored over one whose error rate is 0.2. Why? Because boosting will negatively weight the former decision stump, effectively flipping its predictions and turning it into a predictor with error rate 0.1.
You should think carefully about how to efficiently search for such a decision stump so that your code runs in a reasonable amount of time.
Weak Learning via "Randomly Chosen" Decision Stumps
Create a weak learner that returns a "random" decision stump, independent of the weighted training set given.
Note that you would almost certainly never do this in practice, but the point of this exercise is to demonstrate that boosting can leverage any weak predictor given, even one chosen at random.
Boosting with Decision Stumps
Here we will work with the Spambase dataset from HW02, testing your implementations using Fold 1 as described in HW02. You can work with the preconditioned or non- preconditioned data; it should make little difference when boosting via decision stumps. (Consider why this is so...)
- "Optimal" Decision Stumps: Run your implementation of boosting with "optimal" decision stumps on the training data. After each round, you should compute (1) the local "round" error for the decision stump returned, (2) the current training error for the weighted linear combination predictor at this round, (3) the current testing error for the weighted linear combination predictor at this round, and (4) the current test AUC for the weighted linear combination predictor at this round.
  - Create three plots: One for the local "round" error (which should go up as rounds increase), one for the training and test error (which should both go down as rounds increase), and one for the test AUC (which should go up as the rounds increase). You should boost until you see "convergence" in test error or AUC.
  - For the final weighted linear combination that is produced, create an ROC curve on the test data and compare your results to those you obtained in previous assignments.
  You should think carefully about how you can efficiently generate the required results above. For example, I would suggest keeping a running weighted linear prediction value (before thresholding at zero) for each training and testing instance: when each new round predictor is created, you can simply update your running weighted linear prediction value and then easily compute training and testing error rates (by thresholding these values at zero), as well as testing AUCs (by ranking the instances by these values).
- "Randomly Chosen" Decision Stumps: Repeat the procedure above for "randomly chosen" decision stumps. Note that you will almost certaily have to boost for more rounds to "converge".
Prepare a Report
You should prepare a report describing your results above. You should also submit your code. You may hand in your report on paper or via e-mail, and you should submit your code via e-mail.