The birthday problem (and indicator random variables)
How are the following three questions different?
1) How many people should be in a room so that we expect at least one birthday match?
2) How many people should be in a room so that the probability of having at least one birthday match is greater than 1/2?
3) How many people must we ask, one at a time, so that we expect to find at least one birthday match?
The answer to (1) is approximately 28, derived using IRVs in the course notes.
Note that saying "we expect to get one birthday match" has nothing to do with guaranteeing a match. It should be interpreted as the average outcome of an experiment. In other words, consider setting up a huge number of rooms, say N. Each of these N rooms will have 28 people. Count the total number of birthday matches found in each separate room.
When we divide this by N, we get the expected number of matches per room. Note that using linearity of expectation, we can
get one from the other.
The answer to (2) is approximately 23, derived using conditional probability, or via a counting argument, as shown in the course notes.
This is not an average number of matches; it is a median. It does not involve expectation (e.g., we cannot interpret this as expect 0.5 matches). If we had N rooms, each with 23 people, we would expect to get N/2 rooms with no match, and N/2 rooms with at least one match (ignoring the fact that the answer wasn't an integer).
When we derive the value 23, all we were are doing is summing up the number of times that we won't find a match vs the number of times that we will find at least one; we don't care about how many matches occur in a room if at least one match is there. That is the difference between average and median.
Partition the N rooms, each with 23 people, based on which ones contain no match vs at least one match, and look at how these rooms would contribute to the total number of birthday matches. The "no-match" rooms would contribute zero, and we said half of the rooms will be of this type.
The "match" rooms could contribute any number of matches from 1 to 23-choose-2. These rooms would have to average exactly 2 matches, for the overall average to be 1.
But that is not the case. It turns out that, on average, the N/2 "match" rooms will actually contribute fewer than 2 matches each (but at least one each). Combined with the zero matches in the other N/2 rooms this results in an overall average of less than 1. That is why we actually need more people (28) in each room, to expect 1 match per room.
The answer to (3) is approximately 24.
That is because it is different to be forced to use k people in your experiment every single time, than to allow a count from 1 to n. (Note that we just need n>365, to guarantee a match). If you repeat this third version N times, sometimes you'll get a match with 2 people, and in theory sometimes you'll actually have to ask 366 people (although we've seen that that won't happen in the lifetime of our universe). The math involved for this 3rd version is slightly trickier than the other two.
So, depending on how you formulate your question about birthdays, you get different answers.
Think about this before you go making any bets!
Back to my notes on IRVs