Why is there only one subject in my mixed model?

4

This morning I had to send my watch off to Switzerland to have its guts replaced. I will be late to everything for the next two weeks. So it was timely that when I got to work, my inbox had the following question from a former student in the Mixed Models class:

I often need to perform an analysis of RCB data. When I set up PROC MIXED to generate the terms of the model using a random statement for blocks, the dimensions are incorrect.

For example, with 16 blocks and 3 measurements per block, I get subjects=1 and max observations per subject = 48 instead of the correct 16 and 3 respectively.

random block;

What am I doing wrong?

 

The user isn't doing anything wrong, and the analysis is correct. However, the dimensions table can be confusing because “Subjects” doesn't necessarily refer to the physical blocks in the study. Of course, blocks could refer to lots, patients, litters (sires), clinics, etc. To your software, they're all just random effects. In the Dimensions table in PROC MIXED, Subjects refers to the number of identical blocks in the V matrix.

For example,

random int / subject = block;

would produce one column of Z per patient, and a 1x1 G matrix per patient, and then when V is computed, it processes one subject at a time. The Dimensions table in PROC MIXED says there are 16 subjects. This is computationally efficient because rather than process a Z matrix with 16 columns and a 16x16 G matrix, it processes a column vector Z and a scalar G 16 times.

You can get the same analysis with:

random block;

but the second version takes more computational resources because Z and G each have 16 columns.

For an added bit of savings, sort the data by block first, and then leave block off the CLASS statement to skip creating the class level table. It's not such a big savings with 16 blocks, but if you had 892 of them, you’d be glad to save as many CPU seconds as possible.

Now if someone could just save my watch, I'd be processing more efficiently as well!

Tags
Share

About Author

Catherine (Cat) Truxillo

Director of Analytical Education, SAS

Catherine Truxillo, Ph.D. has written or co-written SAS training courses for advanced statistical methods, including: multivariate statistics, linear and generalized linear mixed models, multilevel models, structural equation models, imputation methods for missing data, statistical process control, design and analysis of experiments, and cluster analysis. She also teaches courses on leadership and communication in data science.

4 Comments

  1. Cat Truxillo on

    Hi Patrick-
    If you use an appropriate DDFM, it does not matter whether your model is processed by subjects or not with regrd to DF. Without seeing your example, I can't really say for sure what the problem was in your model, but I am going to venture a guess that maybe you were using containmnet DF without explicitly specify a nesting structure?

    • Patrick Darken on

      I was using the default. When I use KR I get a reasonable number for the DF but can I trust that the correlation structure is being appropriately accounted for in the variance when it says there is only one subject?

      • Yes -- as the blog points out, the Subjects label does not specifically refer to the number of cases or the number of levels of the random effect. It only refers to the number of blocks of the covariance matrix when the matrix is processed by subjects. The same matrix can be processed as one large block or as a block-diagonal matrix with each sub-block processed separately. The results are identical, but the computer will take longer to process the full matrix than it will to process by subjects.

  2. Patrick Darken on

    Are the degrees of freedom correct for the test of whole plot effects though? I am having a similar problem. With just a repeated statement, SAS seems to recognize the subjects correctly and the tests of the whole plot factors use approximately the right degrees of freedom. But when I add a random statement, I am back to one subject with 700+ observations and the tests of the whole plot factors use the wrong degrees of freedom.

Back to Top