LP for set cover: ================ min Sum_S x_S c(S) s.t. Sum_{S containing e} x_S >= 1, for all e in U x_S >= 0 LP-based greedy algorithm for set cover: (Hooman's algorithm) ======================================= 1. Solve LP 2. Sort the x_S in decreasing order. 3. Select the smallest prefix of this sequence of sets (in the sorted order) that covers all of the elements. Counterexample for the above algorithm: ====================================== The approach is to build a system in which any solution will have to select at least one set from a subcollection of high-cost sets since these sets are the only ones that contain a particular element. Now, the optimal LP solution may pick these sets fractionally. We set the system up in such a way that the fractional values for these sets in the LP solution exceed the fractional values chosen for a large number of other sets, which are the ones that primarily cover the other elements that belong to the expensive set collection. Define A = {a}, B = {b_1, b_2, ..., b_n}, C = {c_1, c_2, ..., c_{2n-2}} U = A U B U C = {a, b_1, b_2, ..., b_n, c_1, c_2, ..., c_{2n-2}} Collection of sets = X U Y U Z X = {{a,b_1}, {a,b_2}, ..., {a,b_n}} Y = {{c_1}, {c_2}, ..., {c_{2n-2}}} Z = {{b_1,c_1}, ..., {b_1,c_{2n-2}}, {b_2,c_1}, ..., {b_2,c_{2n-2}}, ..., {b_n,c_1}, ..., {b_n,c_{2n-2}}} c(S) = L for S in X, 1 for S in Y and 2 for S in Z, L >> 1 Here is a solution x for the set cover LP. x_S = 1/n for all S in X x_S = 1/2 for all S in Y x_S = 1/(2n) for all S in Z Feasibility of x: ================ -- For the element a, Sum_{S containing a} = n*(1/n) = 1 -- For an element e in {b_1,..,b_n}, Sum_{S containing e} = 1/n + (2n-2)/(2n) = 1 -- For an element e in {c_1, c_2, ..., c_{2n-2}}, Sum_{S containing e} = 1/2 + n/(2n) = 1 Optimality of x: =============== The cost of x = L + (2n-2)/2 + 2*n(2n-2)/(2n) = L+n-1+2n-2 = L+3n-3. Recall that the dual LP is: max Sum_{e in U} y_e s.t. Sum_{e in S} y_e <= c(S) y_e >= 0 Consider the following solution y to the dual LP. y_e = L - 1 if e = a = 1 for e in B U C The solution is feasible since the sum of y_e is L for each set in X, 1 for each set in Y and 2 for each set in Z. The cost of y = L-1+3n-2 = L+3n-3. Since the cost of x equals the cost of y, x and y are both optimal for their respective LPs. Performance of the LP-based greedy algorithm: ============================================ The above algorithm will select X U Y for a total cost of nL+2n-2. A good integral solution is to select one set in X, all sets in Y, and n sets in Z (one for each c_i), yielding a total cost of L + 2n-2 + 2n = L+4n-2. If we make L large, the approximation ratio of the algorithm tends to n.