Last week, a student in my Mixed Models Analysis Using SAS class sent in the following text message during a discussion of crossover designs (sometimes known as ABBA designs, where factors vary within subjects, not ABBA designs where you’re like a Super Trouper).
Does it make sense to look at repeated measures (multiple treatments) in the same way as repeated measures (over time)? Is the model essentially the same?
This is a common point of confusion for people learning mixed models, particularly if they have experience with other types of repeated measures analysis. It is also such a good question, one that is central to selecting a covariance structure in a mixed models analysis, that I decided to make a blog post of it.
The Study Design
In the crossover design the student asks about, each patient came in for three different office visits, so there are repeated measures. Each visit corresponded to a different drug, and the sequence of drugs within patients was randomized. The response was change in heart rate from baseline. There are other effects in the model, which we will not elaborate upon here. Instead, let’s focus on how the variance and covariance part of the model could be handled.
PROC MIXED DATA = crossover;
CLASS patient drug visit;
MODEL hr_change = drug visit [and other effects not of interest here]/ ddfm=kr;
RANDOM or REPEATED [this way goes controversy];
RUN;
Mixed Models Repeated Measures Analysis
The mixed models repeated measures analysis that many people think of enables correlation among observations and possible nonconstant variances through the specification of the R matrix, the covariance matrix of the residuals. For example,
REPEATED visit / SUBJECT = patient TYPE = CS [or other structures];
What does it mean for the covariances?
This produces a V matrix where the variance of observations is constant, and the covariance between two visits within an observation is an estimate of σp2.
The covariance among observations from different patients is 0.
Mixed Model with a Random Patient Effect
This can also be conceptualized as a mixed model with multiple observations nested within a larger observation. For example,
RANDOM patient;
What does it mean for the covariances?
This produces a G matrix with a constant variance σp2 and covariance between patients = 0. The R matrix (by default) assumes constant variance and no covariance among residuals. In the final covariance matrix of the observations (V), the within-patient covariance is the estimate of σp2.
Did You See That?
Both of the previous models (random patient and repeated patient with type=cs) lead to the same V matrix- they are equivalent in a linear mixed model.
I have heard people refer to this as a “split-plot analysis, ” a convention that is useful because in essence you are treating visits as sub-plots within the whole-plot unit (the patient). The error term for tests of sub-plot treatments (here, it is the Drug) is the residual variance and the error term for tests of whole-plot treatments is the patient variance.
Multiple Random Effects
Now consider the example of patients nested within clinics. There are multiple observations per patient. If you think of the time points as being nested within a larger "subject", the patient, and the patient is nested within a larger “subject,” the clinic, you get:
RANDOM patient(clinic) clinic;
In split-plot terminology, the clinic is the whole plot, and σc2 is the error term for testing whole-plot effects. Patient is the sub-plot, and σp2 is the error term for testing sub-plot effects. Visit is the sub-sub plot and σ2 is the error term for testing sub-sub plot effects.
What does it mean for the covariances?
Under this specification, in the final V matrix, 2 measurements from different patients from the same clinic have an estimated covariance of σc2.
Furthermore, two measurements from the same patient(clinic) have an estimated covariance of (σc2 + σp2).
Random Effects and Repeated Measures
Now comes the part where your subject matter knowledge is critical. Is the correlation between pairs of visits constant? In other words, for a patient, is the correlation between visit 1 and visit 2 the same as the correlation between visit 1 and visit 3? Are the variances of the visits roughly equal? In other words, is the heart rate variance between patients the same at visit 1 , at visit 2, and at visit 3?
If the answer to any of these questions is no, then a more general approach is necessary to handle this changing correlation over time. In a split-plot approach, one assumption was that the visits had the same correlation within a subject regardless of distance in time. That's the reason for a random effect for patient(clinic). If that assumption were not warranted, then you could use a repeated measures analysis with R-side covariance parameters (other than the default estimate of the residual variance, σ2) which enable the within-subject correlation to change with distance in time. For example,
RANDOM clinic;
REPEATED visit / SUBJECT=patient(clinic) TYPE = AR(1);
What does it mean for the covariances?
In the V matrix, the covariance between patients from different clinics is 0. The covariance between different patients from the same clinic is σc2. The covariance between two visits from the same patient is σc2+ σ2ρj where j is 1 for observations 1 visit apart, 2 for observations 2 visits apart, and so on and ρ is the correlation between adjacent visits (the first-order autocorrelation). The variance of an observation is σc2+ σ2.
Is That All There Is to Repeated Measures Analysis in PROC MIXED?
Well, if you’ve modeled the covariance structure of your population reasonably well, then the fun has just begun. Now you are ready to interpret your fixed effects and estimate quantities of interest to answer your research question.
(If you’re still reading, then welcome to my underground nerd lair! I salute you with the secret handshake!)
Alternatively, you could take a different approach altogether. Hierarchical linear models with random coefficients are exceptionally handy in situations where the number of observations per subject and the spacing between measurements vary across subjects.
We discuss the random coefficients approach in the Multilevel Models class, and that’s a topic for another day.
Regardless of the approach you choose, you can accommodate correlation over time through the V matrix in the model, and the MIXED procedure has a number of fit statistics that are useful for model comparison. Also a topic for another day.
I hope this explanation is useful, and that not too many of you got an ABBA song stuck in your head today. See you in class!
13 Comments
Hi Catherine,
I'm trying to find out about how we model repeated measures data, when time is input as a continuous covariate in the model statement. The scenario is a clinical trial: multi-centre, two treatment arms, measurements of a continuous outcome at baseline and then at 4 time-points. I thought of using the following code, where time is continuous:
PROC MIXED data=mydata;
CLASS patient centre;
MODEL Y=Y_0 Treatment time;
RANDOM intercept t/subject=patient(centre);
RUN;
Is this sufficient to model the repeated measures nature of the data, or should I be adding a REPEATED statement too?
Posting a reply on behalf of Catherine:
A REPEATED statement should not be necessary, although I think your code should say:
PROC MIXED data=mydata;
CLASS patient centre Treatment;
MODEL Y=Y_0 Treatment time;
RANDOM intercept time/subject=patient(centre) TYPE=UN; RUN; /* note time instead of t and TYPE=UN on this statement */
You might also want Treatment*time in the model statement. To add this, you can modify the model statement as follows:
MODEL Y=Y_0 Treatment|time;
In the Random Effects and Repeated Measures section, the code is
random clinic;
repeated visit / subject=patient(clinic);
How many subjects does proc mixed report? I have a problem like this, and proc Mixed reports that there is one subject. The rest of the output looks fine, but I would think the number of subjects would be the total number of patients.
Might be a bit late for this comment, but my understanding is if you have a pre-post design with loss to followup (some people dropped out so you don't have post data for everyone), then a t test or ANOVA will drop data from pre if they are missing at post. A mixed model on the other hand will retain all data (ie will keep in pre observations even if missing at post). You obviously still don't have the post data but you don't have to throw away any data that may have cost good time and money to collect.
Pingback: Top 10 SAS Training Post blogs of 2013 | The SAS Training Post
Pingback: Smart Cats with blogs on stats - The SAS Dummy
Pingback: SAS Training’s Greatest Hits - The SAS Training Post
That's a terrific point, Tor-- thank you for pointing it out! If you use the NOBOUND option on the PROC MIXED statement, then the RANDOM statement should be able to converge in this case. But by default, the results could be very different.
Nice blog post.
I came across an applied analysis situation recently where one has the choice of using RANDOM vs. REPEATED to obtain equivalent results with a CS structure in the modeling of dyadic data. Because RANDOM estimates a variance component to represent the correlation of dyad members, the correlation has to be positive but REPEATED allows for both positive and negative correlations among the dyad members. If the correlation among the dyad members is negative, PROC MIXED will converge with the REPEATED syntax version, but not with the RANDOM syntax version. Kenny, Kashy, and Cook mention this issues in their dyadic data analysis text. Perhaps this will help someone else who encounters in a similar situation in their own research.
With best wishes,
Tor Neilands
UCSF
Really good post! It's very interesting to see how you can get the same results. One thing I do to minimise confusion between RANDOM and REPEATED statements is think that REPEATED is when the subject is on the same treatment (and measurements are done at different timepoints). In the above Cross-Over Example I would have fitted a RANDOM statement as the subject was on 2 different treatments, but again it's good to see why you could fit the model with a REPEATED statement.
Thanks,
Kriss
Great question! You're right that there's not a lot of difference between what a mixed model reveals and what a MV repeated measures analysis reveals about the population in the kind of study you describe. You could also analyze that design with a 2-sample t-test on the difference score, if you wanted to keep it really simple. PROC GLM with a repeated statement might be less efficient than a simpler covariance structure in MIXED because in fitting a MANOVA model you're estimating three parameters where a CS structure would only involve two. So you can earn back some power in the mixed model, but the results should be very similar between a MV repeated measures and a mixed model.
The analysis of pre-post studies with a between-subjects treatment are always kind of contentious because there are so many ways to attack it. You could compute a difference score, and fit a model DIFF=TRT.This is equivalent to a 2-sample t-test on the difference. Another approach is an ANCOVA approach, POST=PRE TRT PRE*TRT, or alternatively, DIFF=PRE TRT PRE*TRT. Each model tells you a little something different, but what they all have in common is an assumption of homogeneous variance. If that assumption isn't a good fit, it is possible in a mixed model to fit heterogeneous variances for the different TRT groups.
Really, what I think it boils down to is if you get the same model from GLM or MIXED, use the procedure you are more comfortable with. If you are likely to expand on this simpler model (such as extending your study to include a 2nd or 3rd followup visit) then it is better to have one consistent analytical approach throughout. If all you'll ever look at it pre-post, then a simpler analysis is typically easier to describe to a lay audience.
Thanks for your question and thanks for reading!
Hi Catherine,
Thanks for your post. I have a question and I will appreciate if you could please help me.
I want to estimate the effect of a treatment through Difference-in-Difference modeling. I use matched pair observation ( 1 treated observation is matched to two control). The data set is in long format. I was just wondering if the below code with random statement is correct?
PROC MIXED DATA = ;
CLASS post treatment classfication X MatchedID ID;
MODEL outcome=post|treatment X1 X2..Xn / SOLUTION;
LSMEANS post|treatment / DIFF
ESTIMATE 'D-I-D' post * treatment 1 -1 -1 1;
RANDOM Int/SUBJECT= ID(_MatchedID) TYPE=UN ;
RUN;
Awesome discussion. As far as I can see, if an experiment only has two measurements, just pretest and posttest, and two groups - experimental and control, then there is not a need for proc mixed as there cannot be differences in the correlations between time points as you only have the one correlation.
This describes a lot of educational research and yet I see people using Proc Mixed in those situations. While there is certainly nothing wrong about that I don't see how it is any advantage over Proc GLM with a Repeated statement in those cases. It seems to me it should give you the exact same thing, no?
Is there some new cool amazing advantage here I am missing ? If so, I am sad.