From rraj@ccs.neu.edu Tue Apr 13 10:49:01 2004 Date: Mon, 23 Feb 2004 23:39:46 -0500 (EST) From: Rajmohan Rajaraman To: Jiangzhuo Chen Cc: csg399@ccs.neu.edu Subject: Re: reminder: CSG399 discussion today at 4 Looks like the authors of the link Chen forwarded were going along the same path as we were trying to in class. One problem with this analysis approach is that we are mimicking the proof that we went through in class (and the one given in the text) for the ordinary set cover problem. It seems difficult to escape the H_{total coverage} kind of a bound since the analysis assigns a price to each time an element is covered, and tries to bound each price *separately*. Let us reconsider the algorithm that Jing Wang (and perhaps, others) proposed. More precisely, -- We maintain universe U to be the set of elements that have not been completely covered. -- In each round, we select a set S that has the smallest value for the ratio of cost(S) to the number of elements in the intersection of S with U. -- We choose r copies of S, where r is the smallest requirement of an element in S. -- We remove all elements from U that have been completely covered and repeat. Before analyzing the above algorithm, note that for the sake of analysis we can assume that we actually only select 1 copy of S and instead increase the number of rounds. The performance will be the same. The reason for selecting several copies of S is to make the running time polynomial. To prove that the above algorithm yields a H_n approximation, we will follow a different analysis. Essentially, what we will try to prove is that the approximation ratio is H_m, where m is the largest cardinality of a set. Note that m <= n. This is Problem 2.8 of the textbook for the ordinary set cover problem. We will still associate a price with each element for every time we cover it in the same way as before. That is if element e_j is covered the kth time by selecting S, then we set p(e_j,k) to be cost(S) divided by the size of the intersection of S and U. Now, if we add p(e_j,k), for k from 1 to r_j (e_j's requirement), and for j from 1 to n, we get the total cost of the solution. So it is sufficient to bound the sum of the p(.,.)'s. Consider an arbitrary set S. Suppose we sort all the elements of S according to the time at which their requirements were completely satisfied. Then it can be shown that sum of the p(e_j,r_j) values, over all e_j in S, is H_{|S|}. (This requires some thought.) The rest of the H_m proof can then be derived, again with some work. I think this is a worthwhile exercise for each of us to work on individually. I would encourage all of you to go over this and try to complete the proof (assuming it is correct!) and write it up before Wednesday's class. A good starting point for reference reading is the analysis of the greedy set cover algorithm given in Cormen-Leiserson-Rivest-Stein. If there are any concerns about the above idea, please do send email to the list. Best, Rajmohan. On Mon, 23 Feb 2004, Jiangzhuo Chen wrote: > When I am googling, I find this: > http://cs-www.cs.yale.edu/homes/zhengma/cs465/as1/soln1.ps > > It solves 2.9 by assuming a target of O(H_n) instead of exactly H_n. And > it also proposes an example for not doing H_n. Interesting ... > > Best, > ~~~~~~~~~~~~~~~~~~~~~ > ~ Jiangzhuo Chen ~ > ~ chenj@ccs.neu.edu ~ > ~ (617) 373-5793 ~ > ~~~~~~~~~~~~~~~~~~~~~ > From rraj@ccs.neu.edu Tue Apr 13 10:49:44 2004 Date: Tue, 24 Feb 2004 21:37:58 -0500 (EST) From: Rajmohan Rajaraman To: csg399@ccs.neu.edu Subject: set multicover problem Joe Willey pointed out a typo in my earlier email. > time at which their requirements were completely satisfied. Then it can > be shown that sum of the p(e_j,r_j) values, over all e_j in S, is H_{|S|}. The last line should read: be shown that sum of the p(e_j,r_j) values, over all e_j in S, is at most Cost(S)*H_{|S|}. Rajmohan.