Last week, a student in my Mixed Models Analysis Using SAS class sent in the following text message during a discussion of crossover designs (sometimes known as ABBA designs, where factors vary within subjects, not ABBA designs where you’re like a Super Trouper).

*Does it make sense to look at repeated measures (multiple treatments) in the same way as repeated measures (over time)? Is the model essentially the same?*

This is a common point of confusion for people learning mixed models, particularly if they have experience with other types of repeated measures analysis. It is also such a good question, one that is central to selecting a covariance structure in a mixed models analysis, that I decided to make a blog post of it.

**The Study Design**

In the crossover design the student asks about, each patient came in for three different office visits, so there are repeated measures. Each visit corresponded to a different drug, and the sequence of drugs within patients was randomized. The response was change in heart rate from baseline. There are other effects in the model, which we will not elaborate upon here. Instead, let’s focus on how the variance and covariance part of the model could be handled.

PROC MIXED DATA = crossover;

CLASS patient drug visit;

MODEL hr_change = drug visit [and other effects not of interest here]/ ddfm=kr;

**RANDOM or REPEATED [this way goes controversy];**

RUN;

**Mixed Models Repeated Measures Analysis**

The mixed models repeated measures analysis that many people think of enables correlation among observations and possible nonconstant variances through the specification of the R matrix, the covariance matrix of the residuals. For example,

REPEATED visit / SUBJECT = patient TYPE = CS [or other structures];

*What does it mean for the covariances?*

This produces a **V** matrix where the variance of observations is constant, and the covariance between two visits within an observation is an estimate of σ_{p}^{2}.

The covariance among observations from different patients is 0.

**Mixed Model with a Random Patient Effect**

This can also be conceptualized as a mixed model with multiple observations nested within a larger observation. For example,

RANDOM patient;

*What does it mean for the covariances?*

This produces a G matrix with a constant variance σ_{p}^{2} and covariance between patients = 0. The **R** matrix (by default) assumes constant variance and no covariance among residuals. In the final covariance matrix of the observations (**V**), the within-patient covariance is the estimate of σ_{p}^{2}.

**Did You See That?**

Both of the previous models (random patient and repeated patient with type=cs) lead to the same V matrix- they are equivalent in a linear mixed model.

I have heard people refer to this as a “split-plot analysis, ” a convention that is useful because in essence you are treating visits as sub-plots within the whole-plot unit (the patient). The error term for tests of sub-plot treatments (here, it is the Drug) is the residual variance and the error term for tests of whole-plot treatments is the patient variance.

**Multiple Random Effects**

Now consider the example of patients nested within clinics. There are multiple observations per patient. If you think of the time points as being nested within a larger "subject", the patient, and the patient is nested within a larger “subject,” the clinic, you get:

RANDOM patient(clinic) clinic;

In split-plot terminology, the clinic is the whole plot, and σ_{c}^{2} is the error term for testing whole-plot effects. Patient is the sub-plot, and σ_{p}^{2} is the error term for testing sub-plot effects. Visit is the sub-sub plot and σ^{2} is the error term for testing sub-sub plot effects.

*What does it mean for the covariances?*

Under this specification, in the final **V** matrix, 2 measurements from different patients from the same clinic have an estimated covariance of σ_{c}^{2}.

Furthermore, two measurements from the same patient(clinic) have an estimated covariance of (σ_{c}^{2} + σ_{p}^{2}).

**Random Effects and Repeated Measures**

Now comes the part where your subject matter knowledge is critical. Is the correlation between pairs of visits constant? In other words, for a patient, is the correlation between visit 1 and visit 2 the same as the correlation between visit 1 and visit 3? Are the variances of the visits roughly equal? In other words, is the heart rate variance between patients the same at visit 1 , at visit 2, and at visit 3?

If the answer to any of these questions is no, then a more general approach is necessary to handle this changing correlation over time. In a split-plot approach, one assumption was that the visits had the same correlation within a subject regardless of distance in time. That's the reason for a random effect for patient(clinic). If that assumption were not warranted, then you could use a repeated measures analysis with **R**-side covariance parameters (other than the default estimate of the residual variance, σ^{2}) which enable the within-subject correlation to change with distance in time. For example,

RANDOM clinic;

REPEATED visit / SUBJECT=patient(clinic) TYPE = AR(1);

*What does it mean for the covariances?*

In the **V** matrix, the covariance between patients from different clinics is 0. The covariance between different patients from the same clinic is σ_{c}^{2}. The covariance between two visits from the same patient is σ_{c}^{2}+ σ^{2}ρ^{j} where j is 1 for observations 1 visit apart, 2 for observations 2 visits apart, and so on and ρ is the correlation between adjacent visits (the first-order autocorrelation). The variance of an observation is σ_{c}^{2}+ σ^{2}.

**Is That All There Is to Repeated Measures Analysis in PROC MIXED?**

Well, if you’ve modeled the covariance structure of your population reasonably well, then the fun has just begun. Now you are ready to interpret your fixed effects and estimate quantities of interest to answer your research question.

(If you’re still reading, then welcome to my underground nerd lair! I salute you with the secret handshake!)

Alternatively, you could take a different approach altogether. Hierarchical linear models with random coefficients are exceptionally handy in situations where the number of observations per subject and the spacing between measurements vary across subjects.

We discuss the random coefficients approach in the Multilevel Models class, and that’s a topic for another day.

Regardless of the approach you choose, you can accommodate correlation over time through the **V** matrix in the model, and the MIXED procedure has a number of fit statistics that are useful for model comparison. Also a topic for another day.

I hope this explanation is useful, and that not too many of you got an ABBA song stuck in your head today. See you in class!

## 10 Comments

Might be a bit late for this comment, but my understanding is if you have a pre-post design with loss to followup (some people dropped out so you don't have post data for everyone), then a t test or ANOVA will drop data from pre if they are missing at post. A mixed model on the other hand will retain all data (ie will keep in pre observations even if missing at post). You obviously still don't have the post data but you don't have to throw away any data that may have cost good time and money to collect.

Pingback: Top 10 SAS Training Post blogs of 2013 | The SAS Training Post

Pingback: Smart Cats with blogs on stats - The SAS Dummy

Pingback: SAS Training’s Greatest Hits - The SAS Training Post

That's a terrific point, Tor-- thank you for pointing it out! If you use the NOBOUND option on the PROC MIXED statement, then the RANDOM statement should be able to converge in this case. But by default, the results could be very different.

Nice blog post.

I came across an applied analysis situation recently where one has the choice of using RANDOM vs. REPEATED to obtain equivalent results with a CS structure in the modeling of dyadic data. Because RANDOM estimates a variance component to represent the correlation of dyad members, the correlation has to be positive but REPEATED allows for both positive and negative correlations among the dyad members. If the correlation among the dyad members is negative, PROC MIXED will converge with the REPEATED syntax version, but not with the RANDOM syntax version. Kenny, Kashy, and Cook mention this issues in their dyadic data analysis text. Perhaps this will help someone else who encounters in a similar situation in their own research.

With best wishes,

Tor Neilands

UCSF

Really good post! It's very interesting to see how you can get the same results. One thing I do to minimise confusion between RANDOM and REPEATED statements is think that REPEATED is when the subject is on the same treatment (and measurements are done at different timepoints). In the above Cross-Over Example I would have fitted a RANDOM statement as the subject was on 2 different treatments, but again it's good to see why you could fit the model with a REPEATED statement.

Thanks,

Kriss

Great question! You're right that there's not a lot of difference between what a mixed model reveals and what a MV repeated measures analysis reveals about the population in the kind of study you describe. You could also analyze that design with a 2-sample t-test on the difference score, if you wanted to keep it really simple. PROC GLM with a repeated statement might be less efficient than a simpler covariance structure in MIXED because in fitting a MANOVA model you're estimating three parameters where a CS structure would only involve two. So you can earn back some power in the mixed model, but the results should be very similar between a MV repeated measures and a mixed model.

The analysis of pre-post studies with a between-subjects treatment are always kind of contentious because there are so many ways to attack it. You could compute a difference score, and fit a model DIFF=TRT.This is equivalent to a 2-sample t-test on the difference. Another approach is an ANCOVA approach, POST=PRE TRT PRE*TRT, or alternatively, DIFF=PRE TRT PRE*TRT. Each model tells you a little something different, but what they all have in common is an assumption of homogeneous variance. If that assumption isn't a good fit, it is possible in a mixed model to fit heterogeneous variances for the different TRT groups.

Really, what I think it boils down to is if you get the same model from GLM or MIXED, use the procedure you are more comfortable with. If you are likely to expand on this simpler model (such as extending your study to include a 2nd or 3rd followup visit) then it is better to have one consistent analytical approach throughout. If all you'll ever look at it pre-post, then a simpler analysis is typically easier to describe to a lay audience.

Thanks for your question and thanks for reading!

Hi Catherine,

Thanks for your post. I have a question and I will appreciate if you could please help me.

I want to estimate the effect of a treatment through Difference-in-Difference modeling. I use matched pair observation ( 1 treated observation is matched to two control). The data set is in long format. I was just wondering if the below code with random statement is correct?

PROC MIXED DATA = ;

CLASS post treatment classfication X MatchedID ID;

MODEL outcome=post|treatment X1 X2..Xn / SOLUTION;

LSMEANS post|treatment / DIFF

ESTIMATE 'D-I-D' post * treatment 1 -1 -1 1;

RANDOM Int/SUBJECT= ID(_MatchedID) TYPE=UN ;

RUN;

Awesome discussion. As far as I can see, if an experiment only has two measurements, just pretest and posttest, and two groups - experimental and control, then there is not a need for proc mixed as there cannot be differences in the correlations between time points as you only have the one correlation.

This describes a lot of educational research and yet I see people using Proc Mixed in those situations. While there is certainly nothing wrong about that I don't see how it is any advantage over Proc GLM with a Repeated statement in those cases. It seems to me it should give you the exact same thing, no?

Is there some new cool amazing advantage here I am missing ? If so, I am sad.