Editor's Note: The following question was recently asked of our statistical training instructors. Terry Woodfield, along with Bob Lucas took the time to write this eloquent and easily digestible answer.
Question: I'm trying to get a general – very general – understanding what the Bayes theorem is, and is used for. Can anyone give me a simple definition of the Bayes theorem – and by simple I mean really simple, like if you were trying to explain it to an above-average squirrel.
Maybe a fill in the blank thing, like this:
The Bayes theorem (horribly dumbed-down) is...
It’s very useful in situations like these...
It’s a great choice to use in these situations because…
Answer: Here is my take on Bayes theorem. In life, almost everything is “conditional.” I can ask, “What is the life expectancy of a Caucasian male?” The answer may be something like 72 years. However, I am 54 years old, so a better question for me is, “What is the life expectancy for Caucasian males that have survived 54 years?” The answer will almost certainly be larger than 72. I want an estimate “conditioned on” the fact that I have already lived 54 years. Life is dynamic. Things are constantly changing. However, too often decisions are based on static information. Bayes theorem takes advantage of dynamic information to give a better, more correct answer.
Rather than give you probability examples, I’ll stick with simple lifetime examples. Years ago a headline article appeared that declared that married males live longer than single males. (I recall my wife looking sad as she read the article.) For sake of illustration, suppose the article states that a married male’s life expectancy is 72 years, and a single male’s life expectancy is 66 years. Life expectancy is just the average age at death. The article presented several hypotheses: the stress of dating shortens life; a wife ensures a healthier diet; a wife doesn’t tolerate “bad” habits like smoking and drinking. However, a simple question from the Bayes perspective solves the riddle: How many married men died when they were two years old? Suppose the article had instead investigated, for example, the life expectancy of 54 year old married and 54 year old single males. The numbers would be almost identical. (Google reveals current research that still seems to give married males an advantage.)
Bayes theorem gives a nice mathematical representation that helps you calculate Prob(Condition A | Condition B), which is read as “probability of condition A given that condition B already exists or has occurred.”
Bayes theorem is simple, and it is in every statistician’s toolkit. However, I conjecture that your interest probably was motivated by something more general, an area that is currently a hot topic: Bayesian analysis (Bayesian analytics, Bayesian statistics, Bayesian modeling, etc.). Bayesian analysis uses prior information plus data to arrive at predictions that are expressed in terms of posterior probabilities. For example, prior experience suggests that 1% of prospects will respond to a direct mail campaign. Current data provides actual outcomes from appropriate campaigns for individual customers. These customers have attributes that are employed in predicting whether a prospect will respond. The prior probability of an individual responding is 1%, but the posterior probability of a given individual, based on the individual’s attributes like age, income, favorite color, and shoe size, can be larger than 1%, making the individual more appealing for solicitation. If the company ignores individuals scoring under 1% and only solicits those that score above 1%, the company should see increased profits.
Bob Lucas, our Director of Statistical Training pointed out the example we use in our Applied Analytics Using SAS Enterprise Miner course. For example, I have an estimate from my model p(i), that a person will give a donation. If they give a donation, I get on average $15. If they do not give, I lose $1 because I wasted mailing to them. So, I calculate the expected profit if I mail someone:
ExpProfit = (15-1)p(i) + (-1)(1-p(i))
The expected profit is $14 x (probability they respond) - $1 x (probability they do not respond).
The optimal decision is to mail to anyone whose expected profit is greater than zero.
In math, find p such that 14p(i) - (1-p)(i) > 0
Some algebra leads to the optimal cutoff:
p(i) > 1/15
If I mail to anyone whose probability of response is greater than 1/15 then I expect to make money, on average.
This approach can be generalized to more complex problems with different ways of measuring profit or loss. Take this healthcare example: many screening tests have a high false positive rate. Knowing the prevalence of a disease in the population, one can calculate the probability of a patient actually having the disease given that the patient has a positive test using the Bayes rule. Because screening tests are designed to have high false positive rates and low false negative rates, the typical procedures for a positive test is to repeat the test because the true probability that a patient has the disease even though they have a positive test is still very low for rare conditions.
I suspect that you wanted a one sentence explanation. Unfortunately, one sentence explanations use technical terms that also require one or more sentences to explain. I have tried to address your three fill in the blank statements. When customers use SAS Enterprise Miner for fraud solutions, direct marketing, churn predictions, and so on, they are exploiting Bayes theorem.
Statisticians fall into two camps: frequentists and Bayesians. However, Bayesians point at that the frequentist’s approach is almost always a special case of the Bayesian approach. I do not classify myself as falling into either camp. I just use analytics to solve problems and leave the politics to others. However, I am a firm believer in dynamic versus static analysis. Many things that you believe to be true are actually false because the results are based on a static rather than dynamic analysis. For example, you may believe that wages are actually going down as a function of time. However, if population is growing, average wages are guaranteed to drop because more young wage earners are pushing up from the bottom. On the other hand, if you condition wage calculations on age, for example, you will find that age based wages are actually increasing. Anyone trying to tell you otherwise is either dishonest or ignorant. One of my mentors advised me, “Never attribute to malice that which is better explained by incompetence.”