The Monty Hall Problem is one of the most famous problems in elementary probability. It is famous because the correct solution is counter-intuitive and because it caused an uproar when it appeared in the "Ask Marilyn" column in *Parade* magazine in 1990. Discussing the problem has been known to create a Jekyll-and-Hyde effect among mathematicians that transforms ordinarily calm logicians into frenzied, red-faced, name-calling beasts.

The following statement from Wikipedia is the usual statement of the Monty Hall Problem:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host [Monty Hall], who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice?

The implicit assumption here is that the host does not act randomly: he always chooses to reveal one of the two goats. (The other implicit assumption is that "winning" means choosing the car, which might shock rural goat farmers!) The surprising mathematical result is that, on average, if you (the player) switch from your original choice to the other unrevealed door, you will win the car two-thirds of the time. Thus your best strategy is to switch. If you like videos, watch this scene from the movie *21* in which Kevin Spacey uses the Monty Hall Problem to identify a brilliant math student.

A simple analysis of why this "always switch" strategy is advantageous goes like this:

- Case 1: The door you chose hides a goat. The host shows the other goat. If you switch doors, you always win the car. This case occurs with probability 2/3.
- Case 2: The door you chose hides the car. If you switch doors, you lose. This case occurs with probability 1/3.

A section of the Wikipedia article criticizes this simple solution and includes a long discussion of various real and fallacious arguments.

The Monty Hall Problem became (in)famous when thousands of eminent mathematicians and physicists wrote *Parade* magazine "explaining" why switching could not possibly affect the probability of winning the car.
Apparently the great mathematician Paul Erdös refused to believe that switching was a better strategy even after being presented with several mathematical proofs. He was not convinced until he was shown the results of a computer simulation.

Perhaps you, like Erdös, need to see a simulation before you are convinced? Many SAS programmers have written a simulation of the Monty Hall Problem by using the DATA step. The following SAS-oriented discussions are worth reading:

- Irvin Snyder, "Monty Hall, Paul Erdös, and Monte Carlo",
*Proceeding of the MWSUG 2010 conference*. The paper includes the Erdös story and several theoretical solutions, including a decision tree analysis and a Bayesian analysis. The paper also includes DATA step code to simulate the problem. - The
*SAS and R*blog presented a simulation of the Monty Hall Problem in 2010.

What I haven't seen is a simulation in the SAS/IML language. Because of its vectorized nature, you can often write a simulation in SAS/IML without writing any loops. That is the case for the Monty Hall Problem.

Without loss of generality (WLOG), assume that the contestant always picks Door Number 1. The host then shows one of the goats, which is behind either Door Number 2 or Door Number 3. In the following simulation, the probability of winning is computed if you stay with your original guess (which is obviously 1/3) versus if you switch your guess:

proc iml; call randseed(54321); NumSim = 1e5; /* number of simulated games */ car = randfun(NumSim, "Table", {1 1 1}/3); /* unknown door hides car */ guess = j(NumSim, 1, 1); /* WLOG, guess door=1 */ show = choose(car=2, 3, 2); /* host shows a goat */ switch = choose(show=2, 3, 2); /* the door you could switch to */ winIfStay = mean(guess=car); /* (P(win | do not switch) */ winIfSwitch = mean(switch=car); /* (P(win | switch) */ print winIfStay winIfSwitch; |

The simulation (which requires less than 10 lines in SAS/IML) convincingly shows that the probability of winning if you do not switch is 1/3, whereas the probability of winning the car if you switch is 2/3.

The simulation uses three SAS/IML functions that are worth explaining:

- The "table" distribution is used to randomly choose the location of the prize. The RANDFUN function returns 100,000 draws from the discrete uniform distribution on {1, 2, 3}.
- The CHOOSE function is a vectorised IF-THEN/ELSE statement that can simplify complex assignment statements.
- The MEAN function is used to compute the proportion of 1s in a 0/1 vector.

I'd like to thank my colleague Stephen Mistler for recently reminding me about the Monty Hall Problem and for advocating that simulations provide clarity to probability problems.

Don't like my approach? Think you can do better? Still don't believe the result? Leave a comment.

## 14 Comments

I must confess: as a non-mathematician/statistician, I am forever confused among the Monty Hall problem, Monte Carlo simulations, and Montenegro vacations. It's good to know that it's possible to use SAS/IML to simulate at least two out of three of these.

That's funny because as I was writing this post I kept writing "Monte Hall" instead of "Monty Hall." My fingers are trained from years of discussing Monte Carlo simulations.

My favourite argument to disbelievers is this: if you got it right the first time, then you shouldn't switch. If you got it wrong then you should. The probability that you get it right first is 1/3, which is the probability that you don't switch and win. The probability that you got it wrong the first time is 2/3, which is the probability that you switch and win.

Hello Rick and all your readers. I find your articles to be very informative. Keep up the great posts.

As pertains to this particular problem of mathematics, I for one believe there is a fundamental error in the mathematical solution. If the initial question posed is "Choose one of three doors, of which there is a car behind one and only one of them", then the probability of choosing the correct door is 1/3. If however the question posed is "choose one of three doors, of which there is a car behind one and only one of them and it is not the third door", then the probability of choosing the correct door is 1/2. The revelation that the car is not behind the third door, if done only after the contestant makes an initial choice of one of three doors, simply means the odds of having selected the winning door is improved to 1/2. The offer to switch doors is simply an offer to choose between one of two doors. Intuitively, I believe the situation is easier to comprehend if one thinks of the doors as being W for winner, L for loser, and E for excluded (the door the host will reveal as not winning). Without knowing door E, as in the initial case, the probability of winning is 1/3. Knowing which door is E, the only fact revealed by the host after the initial choice, the probability of winning becomes between W and L only. The odds that you chose the winner before the host revealed E simply increased from 1/3 to 1/2. The argument that the probability of winning increases to 2/3 by switching doors is false because there will always be at least one door (i.e. 1/1 probability) that does not have a car behind it and that is not chosen by the contestant. If the contestant was offered to make two choices at the outset, as in "pick which door does not have a car behind it and you win what is behind the remaining doors", then the probability of winning becomes 2/3. Consider the possible challenge "Pick a door which is not a winner!" The odds are 2/3. If a correct losing door is chosen, then a second challenge is presented. "Pick the losing door" or "Pick the winning door!" In either case, the odds are 1/2. There is only one occurrence of a 2/3 probability in the scenario, and that is of picking a "losing" door out of a choice of three doors. This corresponds to the probability that the host will not have a "choice" of two doors to open to reveal a goat. Besides, the producers needed to figure out some way to fill the available time while keeping the audience on its toes.

Pursuing this interesting problem a little further, when an initial choice is made by the contestant, there is a 2/3 probability that a losing door is chosen. The host always has a losing (excluded) door that may be revealed thereafter. Thus if there was a 2/3 probability that the chosen door is a losing door, then there must be a 2/3 probability that the remaining, unrevealed door is the winning door. Thus goes the argument for switching doors. More likely a losing door was chosen at the outset so a switch will result in a winning door. Seems to make sense!

One more time. Suppose there are five doors, one with a prize. Initially the odds of winning the prize are 1/5, of losing 4/5. A choice is made, the host reveals a losing door, and asks his question - switch doors or stick with your choice. It doesn't matter whether you stay with your choice or not. There are now four unopened doors and the odds of winning are 1/4. Repeat this again and the odds become 1/3, similar to the Monty Hall problem. Repeat it once more and now, with two doors left, one chosen and the other not, the odds of winning become 2/3 by switching doors? Seems not to make sense! That's what walking the dog gets you! It seems to me that regardless of how the question is posed and regardless of any previous choices made or revealed, in the end there are only two choices left with a 1/2 probability of selecting the winner.

With N≥3 doors, the odds of winning improve from 1/N to (N-1)/(N(N-2)) if the host reveals a goat and then you switch doors.

Rick,

As you know, I'm NOT a statistician. However, that said, doesn't the formula for more than three doors have to include a 1- the rest of the calculation. And, that said, wouldn't (N-2)/(N(N-2)) be more correct for the four door situation where the contestant never goes back to his/her initial choice?

There could of course be a problem with my code:

title '4 Doors maintaining an initial guess of 1';

data mh;

array a(3);

do i = 1 to 100000;

* Monty puts the prize behind a random door;

prize = rand("TABLE",.25,.25,.25);

* We pick door 1 as our choice;

initial_guess = 1;

* pick goat to reveal;

a(1)=prize;

a(2)=initial_guess;

call missing(a(3));

do until (door_to_remove not in a);

door_to_remove=rand("TABLE",.25,.25,.25);

end;

*pick switched guess;

a(1)=door_to_remove;

do until (new_guess not in a);

new_guess = rand("TABLE",.25,.25,.25);

end;

* pick 2nd goat to reveal;

a(1)=prize;

a(2)=initial_guess;

a(3)=door_to_remove;

do until (door_to_remove2 not in a);

door_to_remove2=rand("TABLE",.25,.25,.25);

end;

*pick switched guess;

a(1)=door_to_remove;

a(3)=door_to_remove2;

do until (new_guess not in a);

new_guess = rand("TABLE",.25,.25,.25);

end;

win_by_keep = (initial_guess eq prize);

win_by_switch = (new_guess eq prize);

output;

end;

run;

proc freq data=mh;

tables win_by_keep win_by_switch;

run;

Art, CEO, AnalystFinder.com

Of course, based on my initial question, I meant:

1-( (N-2)/(N(N-2))) = 1 - 1/N

Good afternoon Rick and readers. I thought I had posted additional comment this morning but now I figure I must have done something incorrectly because it seems not to be there. I now stand convinced of the mathematical soundness to switching. It would be interesting to see a simulation of the situation where there are five doors at the outset and Monty will reveal one "losing" door after each iteration of choices made by the contestant. What choice should the contestant make after each iteration - stick or switch? Thanks again Rick for the enlightenment. Keep up the great work.

What I always think of when this topic comes up is that most of the Monty Hall were "pick from door 1 or door 2" and then offered "the curtain Carol Merril is standing beside".

And of course the solution is similar to others based on expected outcomes: Expected outcome assumes you get to repeat the situation multiple times. The Player gets ONE choice.

I believe that this snippet of code requires SAS 9.4, as earlier versions (e.g., 9.2) return an unresolved module for RANDFUN.

Yes. If you are using an old version of SAS, you can replace the call to the RANDFUN function by

car = j(NumSim,1);

call randgen(car, "Table", {1 1 1}/3);

By the way, there have been five releases of SAS/IML since the 9.2 days, so there are a LOT of new functions!

Pingback: Learning SAS programming for R users - The SAS Dummy