Last modified:
Implement the boosting algorithm as described in class. Note that the specification of boosting provides a clean interface between boosting (the meta-learner) and the underlying weak learning algorithm: in each round, boosting provides a weighted data set to the weak learner, and the weak learner provides a predictor in return. You may choose to keep this clean interface (which would allow you to run boosting over most any weak learner) or you may choose to more simply incorporate the weak learning algorithm inside your boosting code.
Each predictor will correspond to a decision stump, which is just a feature-threshold pair. Note that for each feature fi, you may have many possible thresholds which we shall denote ti,j.
Given an input instance to classify, a decision stump corresponding to feature fi and threshold ti,j will predict +1 if the input instance has a feature fi value exceeding the threshold ti,j; otherwise, it predicts -1.
To create the various thresholds for each feature fi, you should
You should also add two thresholds for each feature: one below all values for that feature and one above all values for that feature.
Note that by removing duplicate values, you will have fewer thresholds than examples for any given feature, and possible far fewer.
Create a weak learner that returns the "best" decision stump with respect to the weighted training set given. Here, the "best" decision stump h is the one whose error is as far from 1/2 as possible; in other words, your goal is to maximize
You should think carefully about how to efficiently search for such a decision stump so that your code runs in a reasonable amount of time.
Create a weak learner that returns a "random" decision stump, independent of the weighted training set given.
Note that you would almost certainly never do this in practice, but the point of this exercise is to demonstrate that boosting can leverage any weak predictor given, even one chosen at random.
Here we will work with the Spambase dataset from HW02, testing your implementations using Fold 1 as described in HW02. You can work with the preconditioned or non- preconditioned data; it should make little difference when boosting via decision stumps. (Consider why this is so...)
You should think carefully about how you can efficiently generate the required results above. For example, I would suggest keeping a running weighted linear prediction value (before thresholding at zero) for each training and testing instance: when each new round predictor is created, you can simply update your running weighted linear prediction value and then easily compute training and testing error rates (by thresholding these values at zero), as well as testing AUCs (by ranking the instances by these values).
You should prepare a report describing your results above. You should also submit your code. You may hand in your report on paper or via e-mail, and you should submit your code via e-mail.