CS6140 Machine Learning
HW2A  Gradient Descent , Perceptron
Make sure you check the syllabus for the due date. Please
use the notations adopted in class, even if the problem is stated
in the book using a different notation.
Make sure to read the notes on Gradient Descent for Regression,
and chapter 5 of DHS, up to (including 5.6 Relaxation Procedures)
PROBLEM 1 [50 points]
A) Run Linear Regression regularized (Ridge) with normal equations.
B) Train Linear Regression using Gradient Descent (and then test) on
Spambase and Housing datasets form HW1.
C) Run Gradient Descent Logistic Regression on Spambase data.
Note: Normalization matters. When you
normalize data features (one feature at a time), you need to
normalize all data (train, test, validation) together, as opposed
to normalize separately training and testing sets.
Compare for each dataset training and testing performance across all
four learning algorithms by making a table like below

Decision or Regression Tree

Linear Regression (Normal Equations)

Linear Ridge Regression (Normal Equations)

Linear Regression(Gradient Descent)

LogisticRegression(Gradeint Descent)

Spambase

Train ACC:
Test ACC:

Train ACC:
Test ACC: 
Train ACC:
Test ACC: 
Train ACC:
Test ACC: 
Train ACC:
Test ACC: 
Housing

Train MSE:
Test MSE: 
Train MSE:
Test MSE: 
Train MSE:
Test MSE: 
Train MSE:
Test MSE: 
N/A .  WHY?

C) For classification (Spambase), produce Confusion Matrices
(TruePos, FalsePos, TrueNeg, FalseNeg) for Decision
Trees, Linear Regression and Logistic Regression  three 2x2
matrices. You will have to use a fixed threshold for each
regression algorithm.
D) For classification (Spambase), produce ROC plots comparison
between linear regression and logistic regression  two curves.
Compute the AUC for each curve.
PROBLEM 2 [50 points] Perceptron Algorithm (Gradient Descent for
a different objective)
Step 1: Dowload the perceptron learning data set that I have created.
The data set is tab delimited with 5 fields, where the first 4
fields are feature values and the last field is the {+1,1} label;
there are 1,000 total data points.
Step 2: Create a perceptron learning algorithm, as described in
class.
Step 3: Run your perceptron learning algorithm on the data set
provided. Keep track of how many iterations you perform until
convergence, as well as how many total updates (corresponding to
mistakes) that occur through each iteration. After convergence, your
code should output the raw weights, as well as the normalized
weights corresponding to the linear classifier
w1 x1 + w2 x2 + w3 x3 + w4 x4 = 1
(You will create the normalized weights by dividing your perceptron
weights w1, w2, w3, and w4 by w0, the weight corresponding to the
special "offset" feature.)
Step 4: Output the result of your perceptron learning algorithm as
described above. Your output should look something like the
following:
[jaa@jaalaptop Perceptron]$ perceptron.pl perceptronData.txt
Iteration
1 , total_mistake 136
Iteration 2 , total_mistake 68
Iteration 3 , total_mistake 50
Iteration 4 , total_mistake 22
Iteration 5 , total_mistake 21
Iteration 6 , total_mistake 34
Iteration 7 , total_mistake 25
Iteration 8 , total_mistake 0
Classifier weights: 17 1.62036704608359 3.27065807088159
4.63999040888332 6.79421449422058 8.26056991916346 9.36697370729981
Normalized with threshold: 0.0953157085931524 0.192391651228329
0.272940612287254 0.399659676130622 0.485915877597851
0.550998453370577
(Note: The output above corresponds to running on a different data
set than yours which has six dimensions as opposed to four. Your
results will be different, but you should convey the same
information as above.)
PROBLEM 3 [30 points]
Read prof Andrew Ng's lecture on ML
practice advice. Write a brief summary (1 page) explaining
the quantities in the lecture and the advice.
Read prof Pedro Domingos's paper on A
Few Useful Things to Know about Machine Learning. Write a
brief summary (1 page), with bullet points.
PROBLEM 4 [20 points, GR_ONLY]
Run Logistic Regression on the Spambase dataset, but using Newton's
numerical method instead of Gradient Descent. An intro to Newton's
method can be found in the lecture notes.
PROBLEM 5 [20 points]
Given a ranking of binary items by prediction score, the ROC is the
curve plotted of True Positives vs False Positives for all possible
thresholds. The AUC is the area under the ROC curve.
Prove that the AUC is also the percentage of item pairs (i,j) in
correct order.
As an illustrative example, the classifier output might be

object score
truelabel

A 100
1
B
99 1
C
96 0
D
95 1
E
90 1
F
85 0
G
82 1
H
60 0
K
40 0
I
38 0
The ROC curve is obtained by truncating the list above at all ranks,
and for each such threshold computing falsepositiverate and
truepositiverate (and plot them).
The problem asks to show that the area under the ROC curve is
approximated by the percentage of pairs in correct order. In this
example the item pairs in incorrect order are (C,D), (C,E),
(C,G), (F,G)
PROBLEM 6 [Extra Credit]
DHS chapter 5 Pb 2 (page 271)
PROBLEM 7 [Extra Credit]
DHS chapter 5 Pb 5, page 271
PROBLEM 8 [Extra Credit]
DHS chapter 5 Pb 6, page 271
PROBLEM 9 [GRONLY]
For a function f(x1,x2,..., xn) with real values, the "Hessian"
is the matrix of partial second derivatives
Consider the loglikelihood function for logistic regression
Show that its Hessian matrix H is negative semidefinite, i.e. for
any vector z satisfies
Remark: This fact is sometimes written
$\mathrm{H\le 0}$and implies the loglikelihood function
is concave.
Hint: