Discriminant Analysis, Priors, and Fairy-Selection

A student in my multivariate class last month asked a question about prior probability specifications in discriminant function analysis:
What if I don't know what the probabilities are in my population? Is it best to just use the default in PROC DISCRIM?

First, a quick refresher of priors in discriminant analysis. Consider a problem of classifying 150 cases (let's say, irises) into three categories (let's say, variety). I have four different measurements taken from each of the flowers.

If I walk through the bog and pick another flower and measure its 4 characteristics, how well can I expect to perform in classifying it as the right variety? One way to derive a classification algorithm is to use linear discriminant analysis.

A linear discriminant function to predict group membership is based on the squared Mahalanobis distance from each observation to the controid of the group plus a function of the prior probability of membership in that group.
This generalized squared distance is then converted into a score of similarity to each group, and the case is classified into the group it is most similar to.

The prior probability is the probability of an observation coming from a particular group in a simple random sample with replacement.

If the prior probabilities are the same for all three of the groups (also known as equal priors), then the function is only based on the squared Mahalanobis distance.

If the prior for group A is larger than for groups B and C, then the function makes it more likely that an observation will be classified as group A, all else being equal.

The default in PROC DISCRIM is equal priors. This default makes sense in the context of developing computational software: the function with equal priors is the simplest, and therefore the most computationally efficient.
PRIORS equal;

Alternatives are proportional priors (using priors that are the proportion of observations from each group in the same input data set) and user-specified priors (just what it sounds like: specify them yourself).
PRIORS proportional;
PRIORS 'A' = .5 'B' = .25 'C' = .25;

Of course this kind of problem is far more interesting when you consider something like people making choices, such as kids choosing an action figure of Tinkerbell, Rosetta, or Vidia. Those certainly don't have equal priors, and if your daughter's anything like mine, she doesn't want to be classified into the wrong group.

So back to the original question:
What if I don't know what the probabilities are in my population? Is it best to just use the default in PROC DISCRIM?

In this case, using the default would probably not be a great idea, as it would assign the dolls with equal probability, all else being equal.

So if not the default, then what should you use? This depends on what you're going to be scoring. Your priors should reflect the probabilities in the population that you will be scoring in the future. Some strategies for getting a decent estimate:
1. Go to historical data to see what the probabilities have been in the past.
2. If your input data set is a simple random sample, use proportional priors.
3. Take a simple random sample from the population and count up the number from each group. This can determine the priors.
4. Combine the probabilities you think are correct with the cost of different types of misclassification.

For example, suppose that. among 4-year-olds, the probabilities of wanting the Tinkerbell, Rosetta, and Vidia action figures are really 0.50, 0.35, and .15 respectively. After all, not many kids want to be the villian.
PRIORS 'Tink' = .5 'Rosetta' = .35 'Vidia' = .15

What is the cost of giving a girl the Rosetta doll when she wanted Tinkerbell? What's the cost of giving a girl Vidia when she wanted Rosetta?, and so on. A table is shown below (based on a very small sample of three interviews of 4-year-old girls):

Clearly the cost of an error is not the same for all errors. It is far worse to assign Vidia to a girl who doesn't want Vidia than for any other error to occur. Also notice the small detail that Vidia fans would prefer to get Rosetta over Tinkerbell. For birthday party favors, I'd massage those priors to err on the side of giving out Rosetta over Vidia.
PRIORS 'Tink' = .5 'Rosetta' = .4 'Vidia' = .1
Of course, depending on your tolerance for crying, you might just give everyone Rosetta and be done with it. But then, really, isn't variety the spice of life?

I hope this has helped at least one or two of you out there who were having trouble with priors. The same concepts apply in logistic regression with offset variables, by the way. But that's a party favor for another day.

3 Comments

Pingback: My Homepage
Cat Truxillo on January 7, 2011 9:46 am

Thanks for the inside information on why the default was chosen for PROC DISCRIM-- the grapevine wisdom passed on to me was the computational efficiency explanation. It is useful to learn that there was another reason.
You're certainly correct that discriminant analysis is fairly robust to misspecified priors in many cases. A lot of the studies I encounter use oversampling (as I did when creating my classification table for the fairy preferences) and so proportional priors would be equal for the sample data. That could be a problem when the population proportions are 80/10/10, for example. But it also depends how different the groups are to begin with. Fisher's Iris data has easily distinguishable groups, so it is easy to discriminant regardless of priors. Human behavior is not typically so compartmentalized and requires more care to predict adequately. For that matter, the choice of a modeling approach (LDA, QDA, logistic regression, etc) can be evaluated empirically.
Luckily, in PROC DISCRIM it is a trivial task to evaluate the importance of priors empirically. When applying a predictive scoring algorithm such as LDA, some empirical validation is in order, as well as continued monitoring of the scoring performance over time. (There are several reasons that DISCRIM is one of my favorite PROCs in SAS/STAT, and one of them is the ease of scoring and validation. Another is that the formulas are in the LISTING output).
Thanks for posting, Rick!
Rick Wicklin on January 7, 2011 9:11 am

For observational studies, I default to proportional priors (which means I am assuming a random sample). I only use equal priors if I truly think that the classes are equal in the population (for example, males and females). I think equal priors are the default because you can prove some optimality condition for classifying Gaussian data....
I think a more relevant issue is whether the choice of a prior matters in practice. In other words, what is the sensitivity of LDA classification to the priors for a given set of data? I would guess that the choice of a prior probably doesn't matter in many cases.

Blogs

Blogs

Discriminant Analysis, Priors, and Fairy-Selection

About Author

3 Comments