From rraj@ccs.neu.edu Tue Apr 13 10:49:01 2004
Date: Mon, 23 Feb 2004 23:39:46 -0500 (EST)
From: Rajmohan Rajaraman <rraj@ccs.neu.edu>
To: Jiangzhuo Chen <chenj@ccs.neu.edu>
Cc: csg399@ccs.neu.edu
Subject: Re: reminder: CSG399 discussion today at 4

Looks like the authors of the link Chen forwarded were going along the
same path as we were trying to in class.  One problem with this analysis
approach is that we are mimicking the proof that we went through in class
(and the one given in the text) for the ordinary set cover problem.  It
seems difficult to escape the H_{total coverage} kind of a bound since the
analysis assigns a price to each time an element is covered, and tries to
bound each price *separately*.

Let us reconsider the algorithm that Jing Wang (and perhaps, others)
proposed.  More precisely,

-- We maintain universe U to be the set of elements that have not been
completely covered.

-- In each round, we select a set S that has the smallest value for the
ratio of cost(S) to the number of elements in the intersection of S with
U.

-- We choose r copies of S, where r is the smallest requirement of an
element in S.

-- We remove all elements from U that have been completely covered and
repeat.

Before analyzing the above algorithm, note that for the sake of analysis
we can assume that we actually only select 1 copy of S and instead
increase the number of rounds.  The performance will be the same.  The
reason for selecting several copies of S is to make the running time
polynomial.

To prove that the above algorithm yields a H_n approximation, we will
follow a different analysis.  Essentially, what we will try to prove is
that the approximation ratio is H_m, where m is the largest cardinality of
a set.  Note that m <= n.  This is Problem 2.8 of the textbook for the
ordinary set cover problem.

We will still associate a price with each element for every time we cover
it in the same way as before.  That is if element e_j is covered the kth
time by selecting S, then we set p(e_j,k) to be cost(S) divided by the
size of the intersection of S and U.  Now, if we add p(e_j,k), for k from
1 to r_j (e_j's requirement), and for j from 1 to n, we get the total cost
of the solution.

So it is sufficient to bound the sum of the p(.,.)'s.  Consider an
arbitrary set S.  Suppose we sort all the elements of S according to the
time at which their requirements were completely satisfied.  Then it can
be shown that sum of the p(e_j,r_j) values, over all e_j in S, is H_{|S|}.
(This requires some thought.)

The rest of the H_m proof can then be derived, again with some work.  I
think this is a worthwhile exercise for each of us to work on
individually.  I would encourage all of you to go over this and try to
complete the proof (assuming it is correct!)  and write it up before
Wednesday's class.

A good starting point for reference reading is the analysis of the greedy
set cover algorithm given in Cormen-Leiserson-Rivest-Stein.

If there are any concerns about the above idea, please do send email to
the list.  Best,

Rajmohan.

On Mon, 23 Feb 2004, Jiangzhuo Chen wrote:

> When I am googling, I find this:
> http://cs-www.cs.yale.edu/homes/zhengma/cs465/as1/soln1.ps
>
> It solves 2.9 by assuming a target of O(H_n) instead of exactly H_n. And
> it also proposes an example for not doing H_n. Interesting ...
>
> Best,
> ~~~~~~~~~~~~~~~~~~~~~
> ~ Jiangzhuo Chen    ~
> ~ chenj@ccs.neu.edu ~
> ~ (617) 373-5793    ~
> ~~~~~~~~~~~~~~~~~~~~~
>

From rraj@ccs.neu.edu Tue Apr 13 10:49:44 2004
Date: Tue, 24 Feb 2004 21:37:58 -0500 (EST)
From: Rajmohan Rajaraman <rraj@ccs.neu.edu>
To: csg399@ccs.neu.edu
Subject: set multicover problem

Joe Willey pointed out a typo in my earlier email.

> time at which their requirements were completely satisfied.  Then it can
> be shown that sum of the p(e_j,r_j) values, over all e_j in S, is H_{|S|}.

The last line should read:

be shown that sum of the p(e_j,r_j) values, over all e_j in S, is
at most Cost(S)*H_{|S|}.

Rajmohan.