"Dad? How many times do I have to roll a die until all six sides appear?"
I stopped what I was doing to consider my son's question. Although I could figure out the answer mathematically, sometimes experiments are more powerful than math equations for showing how probability works.
"Why don't we find out?" I said. "You roll and I'll keep track of the numbers that appear."
Each trial took at least six rolls, but, eventually, we completed 20 trials. The following SAS DATA step contains the number of rolls required until all six faces appeared on the die. The UNIVARIATE procedure displays a histogram of the data:
data RollsUntilAllFaces; input rolls @@; datalines; 6 7 7 7 8 9 9 9 10 10 11 12 12 12 15 16 17 19 20 26 ; proc univariate data=RollsUntilAllFaces; histogram rolls / endpoints=6 to 27 by 3; run;
The Expected Value
It's not hard to write down the expected number of rolls for a single die. You need one roll to see the first face. After that, the probability of rolling a different number is 5/6. Therefore, on average, you expect the second face after 6/5 rolls. After that value appears, the probability of rolling a new face is 4/6, and therefore you expect the third face after 6/4 rolls. Continuing this process leads to the conclusion that the expected number of rolls before all six faces appear is
6/6 + 6/5 + 6/4 + 6/3 + 6/2 + 6/1 = 14.7 rolls.When I told my son that "you should expect to roll the die 14.7 times before all six faces appear," he was unimpressed.
"That's too high," he retorted. "Most of the time we only need 10–12 rolls."
The Median Value
He and I had previously discussed the mean, median, and mode, so I knew what he meant when he told me that I was "using the wrong average." And I agree with him. The median value in our experiment is 10.5, but the mean value (which is strongly influenced by the trials with values 19, 20, and 26) is 12.1. The long tail in the histogram makes it apparent that the median and mean values will be substantially different.
So what is the expected median number of rolls before all six faces appear? I don't know. The only way that I know to estimate the median is through simulation. If you know a reference or a better way, leave a comment.
The General Problem
This problem is called the "coupon collector's problem." If you roll two dice, it is harder (but still possible) to find the expected number of rolls because the outcomes 2–12 do not have equal probability.
The general formulation is "If you are trying to collect N unique items, and the probability of getting the ith item is pi, what is the expected number of items that you need to obtain before you complete the collection?"
Now I understand why I never completed that 1977 Star Wars trading card set! I kept buying and buying until my jaws hurt from chewing all the gum, but with 66 cards in the set, my quest was essentially futile.
Alas, the coveted "light saber" card shall never be mine: the expected number of cards required to complete the set is 315, assuming all cards have an equal probability.