The Punchline: MANOVA or a Mixed Model?

2

Edited to add: Thanks for Larry Madger for noticing an important omission in my code below. I have updated the programs to include the response variables, which enables the responses to have different means.

So, if you were reading last week, we talked about how to structure your data for a mixed models repeated measures analysis. And as my friend Rick pointed out, there’s more than one way to go about restructuring your data (if you ask real nice, he’ll also do it in PROC IML- the Rockstar of programming languages). Then we played with a data set in which the dependent measurements were not ordered over time. In fact, it wasn’t even the same variable.
The Scene:
In order to increase the amount of money customers deposit in three different account types, a bank designs a factorial experiment with two factors: promotion (Gift or Discount) and minimum balance ($75, $125, or $225). Offers are sent to existing customers and the sum of their deposits to the three account types (savings, checking, and investment) are recorded.
The Classical Approach: MANOVA
Multiple continuous variables observed on the same subject is a textbook-perfect scenario for multivariate analysis of variance (MANOVA). MANOVA takes advantage of the correlation among responses within a subject and constructs a matrix of sums of squares and sums of cross-products (SSCP) to compare between- and within-group variability while accounting for correlation among the dependent variables within a subject and unequal variances across the dependent variables.
proc glm data = blog.promoexperiment;
class promotion minbal;
model savbal checkbal investamt= promotion|minbal ;
manova h=_all_;
run;

The data set, as we discussed last week, looks like this:

With one row per customer, one column per dependent variable.
Just like multivariate repeated measures analysis (which is really just MANOVA with some fancy contrasts pre-cooked), a little missing data goes a long way to killing your sample size and therefore statistical power. Furthermore, working with covariates can be tricky with repeated measures MANOVA. The MANOVA SSCP matrices require estimation of many bits, which can also eat up your power. There are four multivariate test statistics, which can also complicate matters if you are not certain which one is the best for you to use.
The Modern Approach: Mixed Models
It turns out that it is really easy to fit an equivalent—but not identical—model in the MIXED procedure.
proc mixed data = blog.promouni;
class promotion minbal;
model value1= response|promotion|minbal/noint ;
repeated /subject = subject type=un;
run;

The data set looks like this:

One row per observation (a dependent variable within a customer).
More, and Different:
If all we were doing was reproducing MANOVA results with PROC MIXED, I would not be writing this blog. We can do more. Instead of just accommodating unequal variances and covariance within a subject, the mixed models approach directly models the covariance structure of the multiple dependent variables. What’s more is that you can also simplify the structure, buying you more power, and making the interpretation of your model easier. For example, you might suspect that the variances are equal and the covariances between pairs of dependent variables are equal across the three dependent variables.
proc mixed data = blog.promouni;
class promotion minbal;
model value1= response|promotion|minbal/noint ;
repeated /subject = subject type=cs;
run;

The fit statistics in the mixed model enable model comparison. Since the mean model is identical in both cases, fit statistics based on REML are appropriate.

Unstructured Fit Statistics
-2 Res Log Likelihood          2585.6
AIC (smaller is better)        2597.6
AICC (smaller is better)       2597.8
BIC (smaller is better)        2615.4

 

CS Fit Statistics

-2 Res Log Likelihood          2851.2
AIC (smaller is better)        2855.2
AICC (smaller is better)       2855.2
BIC (smaller is better)        2861.1

Along with changing the covariance structure, there are the other advantages that tag along with using a mixed model: more efficient handling of missing data, easy to handle covariates, multiple levels of nesting is easy to accommodate (measurements within subjects within sales territories within your wildest imaginings), a time component is easy to model, heterogeneous groups models, to name a few.

To get hypothesis tests that are very similar to the GLM MANOVA tests, you can use the following CONTRAST statements (with thanks to my friend and colleague Jill Tao for slogging through these):

contrast 'promotion effect in PROC GLM'
promotion 1 -1 promotion*response 1 0 0 -1, /** response 1 **/
promotion 1 -1 promotion*response 0 1 0 0 -1,/**response 2 **/
promotion 1 -1 promotion*response 0 0 1 0 0 -1;  /** response 3 **/

contrast 'cashback effect in PROC GLM'
cashback 1 -1 cashback*response 1 0 0 -1, /*** response 1, caskback 1-2 ***/
cashback 1 0 -1 cashback*response 1 0 0 0 0 0 -1, /*** response 1, caskback 1-3 ***/
cashback 1 -1 cashback*response 0 1 0 0 -1, /*** response 2, caskback 1-2 ***/
cashback 1 0 -1 cashback*response 0 1 0 0 0 0 0 -1, /*** response 2, caskback 1-3 ***/
cashback 1 -1 cashback*response 0 0 1 0 0 -1, /*** response 3, caskback 1-2 ***/
cashback 1 0 -1 cashback*response 0 0 1 0 0 0 0 0 -1; /*** response 3, caskback 1-3 ***/

contrast 'promotion*cashback in PROC GLM'
promotion*cashback 1 -1 0 -1 1 0 promotion*cashback*response 1 0 0 -1 0 0 0 0 0 -1 0 0 1,
/*** response 1, promotion*cashback (1-2) for 1 vs (1-2) for 2 ***/
promotion*cashback 1 0 -1 -1 0 1 promotion*cashback*response 1 0 0 0 0 0 -1 0 0 -1 0 0 0 0 0 1,
/*** response 1, promotion*cashback (1-3) for 1 vs (1-3) for 2 ***/
promotion*cashback 1 -1 0 -1 1 0 promotion*cashback*response 0 1 0 0 -1 0 0 0 0 0 -1 0 0 1,
/*** response 2, promotion*cashback (1-2) for 1 vs (1-2) for 2 ***/
promotion*cashback 1 0 -1 -1 0 1 promotion*cashback*response 0 1 0 0 0 0 0 -1 0 0 -1 0 0 0 0 0 1,
/*** response 2, promotion*cashback (1-3) for 1 vs (1-3) for 2 ***/
promotion*cashback 1 -1 0 -1 1 0 promotion*cashback*response 0 0 1 0 0 -1 0 0 0 0 0 -1 0 0 1,
/*** response 3, promotion*cashback (1-2) for 1 vs (1-2) for 2 ***/
promotion*cashback 1 0 -1 -1 0 1 promotion*cashback*response 0 0 1 0 0 0 0 0 -1 0 0 -1 0 0 0 0 0 1;
/*** response 3, promotion*cashback (1-3) for 1 vs (1-3) for 2 ***/
Variation on a Theme: Mixture of Distributions in PROC GLIMMIX
Few days go by that I don’t use the GLIMMIX procedure, and as it happens, there’s a trick in PROC GLIMMIX that makes these types of models even more flexible. Starting in SAS 92, you can model a mixture of distributions from the exponential family, such as one gamma and two normal responses. If my data looked like this:

(Notice the column with the distribution name for each variable) then I could fit the model as follows:
proc glimmix data = blog.promouni;
class promotion minbal;
model value1= distrib|promotion|minbal/noint dist=byobs(distrib);
random intercept /subject = subject;
run;

Or like this, instead:
proc glimmix data = blog.promouni;
class promotion minbal;
model value1= distrib|promotion|minbal/noint dist=byobs(distrib);
random _residual_ /subject = subject type=un;
run;

Those two models are not equivalent, and they both use pseudo likelihood estimation, so you will probably only use this kind of a model in circumstances where nothing else will do the job. Still, it’s quite a bit more than could be done even a couple of years ago.
I know I’m keeping you hanging on for that punchline. So here you are (with my deepest apologies)…
Three correlated responses walk into a bar.
One asks for a pilsner. The second asks for an ale.
The third one tells the bartender, “I’m just not feeling normal today. Better gamma something mixed.”

Share

About Author

Catherine (Cat) Truxillo

Director of Analytical Education, SAS

Catherine Truxillo, Ph.D. has written or co-written SAS training courses for advanced statistical methods, including: multivariate statistics, linear and generalized linear mixed models, multilevel models, structural equation models, imputation methods for missing data, statistical process control, design and analysis of experiments, and cluster analysis. She also teaches courses on leadership and communication in data science.

2 Comments

  1. Pingback: Repeated Measures | The Graduate Student's Guide to Statistics

Back to Top