Why is there only one subject in my mixed model?

4

This morning I had to send my watch off to Switzerland to have its guts replaced. I will be late to everything for the next two weeks. So it was timely that when I got to work, my inbox had the following question from a former student in the Mixed Models class:

I often need to perform an analysis of RCB data. When I set up PROC MIXED to generate the terms of the model using a random statement for blocks, the dimensions are incorrect.

For example, with 16 blocks and 3 measurements per block, I get subjects=1 and max observations per subject = 48 instead of the correct 16 and 3 respectively.

random block;

What am I doing wrong?

 

The user isn't doing anything wrong, and the analysis is correct. However, the dimensions table can be confusing because “Subjects” doesn't necessarily refer to the physical blocks in the study. Of course, blocks could refer to lots, patients, litters (sires), clinics, etc. To your software, they're all just random effects. In the Dimensions table in PROC MIXED, Subjects refers to the number of identical blocks in the V matrix.

For example,

random int / subject = block;

would produce one column of Z per patient, and a 1x1 G matrix per patient, and then when V is computed, it processes one subject at a time. The Dimensions table in PROC MIXED says there are 16 subjects. This is computationally efficient because rather than process a Z matrix with 16 columns and a 16x16 G matrix, it processes a column vector Z and a scalar G 16 times.

You can get the same analysis with:

random block;

but the second version takes more computational resources because Z and G each have 16 columns.

For an added bit of savings, sort the data by block first, and then leave block off the CLASS statement to skip creating the class level table. It's not such a big savings with 16 blocks, but if you had 892 of them, you’d be glad to save as many CPU seconds as possible.

Now if someone could just save my watch, I'd be processing more efficiently as well!

Tags
Share

About Author

Catherine Truxillo

Catherine Truxillo, Ph.D. has been a Statistical Training Specialist at SAS since 2000 and has written or co-written SAS training courses for advanced statistical methods including: multivariate statistics, linear and generalized linear mixed models, multilevel models, structural equation models, imputation methods for missing data, statistical process control, design and analysis of experiments, and cluster analysis. Although she primarily works with advanced statistics topics, she also teaches SAS courses using SAS/IML (the interactive matrix language), SAS Enterprise Guide, SAS Enterprise Miner, SAS Forecast Studio, and JMP software. Before coming to SAS, Catherine completed her Ph.D. in Social Psychology with an emphasis in Statistics at The University of Texas at Austin. While at UT Austin, she completed an internship with the Math and Computer Science department's statistical consulting help desk and taught a number of undergraduate courses. While teaching and performing her own graduate research, she worked for a software usability design company conducting experiments to assess the ease-of-use of various software interfaces and website designs. Cat's personal interests include triathlon, hiking the woods near her home in North Carolina, and having tea parties with her two children.

4 Comments

  1. Cat Truxillo on

    Hi Patrick-
    If you use an appropriate DDFM, it does not matter whether your model is processed by subjects or not with regrd to DF. Without seeing your example, I can't really say for sure what the problem was in your model, but I am going to venture a guess that maybe you were using containmnet DF without explicitly specify a nesting structure?

    • Patrick Darken on

      I was using the default. When I use KR I get a reasonable number for the DF but can I trust that the correlation structure is being appropriately accounted for in the variance when it says there is only one subject?

      • Yes -- as the blog points out, the Subjects label does not specifically refer to the number of cases or the number of levels of the random effect. It only refers to the number of blocks of the covariance matrix when the matrix is processed by subjects. The same matrix can be processed as one large block or as a block-diagonal matrix with each sub-block processed separately. The results are identical, but the computer will take longer to process the full matrix than it will to process by subjects.

  2. Patrick Darken on

    Are the degrees of freedom correct for the test of whole plot effects though? I am having a similar problem. With just a repeated statement, SAS seems to recognize the subjects correctly and the tests of the whole plot factors use approximately the right degrees of freedom. But when I add a random statement, I am back to one subject with 700+ observations and the tests of the whole plot factors use the wrong degrees of freedom.

Leave A Reply

Back to Top