Most SAS regression procedures support the "stars and bars" operators, which enable you to create models that include main effects and all higher-order interaction effects. You can also easily create models that include all n-way interactions up to a specified value of n. However, it can be a challenge to specify models that include many—but not all!—higher-order interactions. This article describes a little-known trick: you can use COLLECTION effects to specify interaction terms.
Stars and Bars: Building models with interaction terms in SAS
Many of the regression procedures in SAS (such as GLM, GENMOD, LOGISTIC, MIXED,...) support the bar operator (|) to specify all interactions between effects. For example, the following MODEL statement specifies that the model should include all main effects and all higher-order interactions:
proc logistic; model Y = x1 | x2 | x3 | x4; /* all main effects and interactions */ run; |
The previous MODEL statement includes all two-way, three-way, and four-way interaction effects. The statement is equivalent to the following statement that uses the star operator (*) to explicitly specify each interaction term:
model Y = x1 x2 x3 x4 /* all main effects */ x1*x2 x1*x3 x1*x4 x2*x3 x2*x4 x3*x4 /* all two-way interactions */ x1*x2*x3 x1*x2*x4 x1*x3*x4 x2*x3*x4 /* all three-way interactions */ x1*x2*x3*x4; /* all four-way interactions */ |
Fitting a model with so many effects will lead to overfitting, so in practice an analyst might restrict the model to two-way interactions. Again, SAS supplies an easy syntax. You can use the "at" operator (@) to specify the highest interaction terms in the model. For example, the following syntax specifies that the model contains only main effects and two-way interactions:
model Y = x1 | x2 | x3 | x4 @2; /* main effects and two-way interactions */ |
Specifying many, but not all, interaction terms
Unfortunately, there is no simple syntax for constructing many, but not all, interaction effects. This can be frustrating when there is a structure to the interaction terms. A common structure is that there are two lists of variables and you want to build all interactions that involve one effect from the first list and one effect from the second list.
For example, suppose you want to create the following interaction effects:
c1*x1 c1*x2 c2*x1 c2*x2
The interaction terms are the pairwise combinations of the variables {c1 c2} with the variables {x1 x2}. Note, however, that within-list interactions are not desired: there are no terms for c1*c2 or x1*x2.
It would be great to have some kind of shorthand notation that tells SAS to "cross all elements in the first list with all elements in the second list." A natural syntax would be
(c1 c2) | (x1 x2)
but unfortunately that syntax is not supported.
Some SAS programmers might use the macro language to generate all pairwise interactions between two lists of variables, but COLLECTION effects offer an easier way.
COLLECTION effects
More than a dozen regression procedures in SAS support the EFFECT statement. According to the documentation, the EFFECT statement generates "special collections of columns for design matrices." In particular, the so-called COLLECTION effect enables you to specify multiple variables that are "considered as a unit."
As a colleague recently reminded me, you can use COLLECTION effects to specify interactions. If V and W are two collection effects, then V*W contains all pairwise interactions of the individual variables in V with the individual variables in W. Similarly, V | W contains all main effects and the pairwise interaction effects.
As an example of using COLLECTION effects, the following model uses two classification variables and four continuous variables in the Sashelp.Heart data. Here is the model specified in the usual way:
proc logistic data=Sashelp.Heart; class BP_Status Sex; model Status = BP_Status Sex Cholesterol Height Weight MRW BP_Status*Cholesterol BP_Status*Height BP_Status*Weight BP_Status*MRW Sex*Cholesterol Sex*Height Sex*Weight Sex*MRW; ods select ParameterEstimates; ods output ParameterEstimates = Parm1; run; |
Manually enumerating all those interaction terms requires a lot of typing. More importantly, the enumeration does not make it clear that the interaction terms are the pairwise interactions between the classification variables and the continuous variables. In contrast, the following statements use COLLECTION effects to define two sets of variables. The MODEL statement uses the familiar bar operator to form all main effects and pairwise interactions between the variables.
proc logistic data=Sashelp.Heart; class BP_Status Sex; effect V = collection(BP_Status Sex); /* one list */ effect W = collection(Cholesterol Height Weight MRW); /* another list */ model Status = V | W; /* vars and interactions between the var lists */ ods select ParameterEstimates; ods output ParameterEstimates = Parm2; run; |
The second model statement is more concise. The two models produce equivalent predictions, but the second is much easier to type and to interpret.
You can use COLLECTION effects to specify interaction terms in regression models. #sastip Click To TweetYou can use PROC COMPARE to show that the parameter estimates are the same (to eight decimal places), and therefore the predicted values will be the same. Because the order of the parameters differs between models, the parameter estimates are sorted before running the comparison.
proc sort data=Parm1; by Estimate; run; proc sort data=Parm2; by Estimate; run; proc compare brief method=absolute criterion=1e-8 base =Parm1(drop=Variable) compare=Parm2(drop=Variable ClassVal:); run; |
NOTE: All values compared are within the equality criterion used. |
This use of the COLLECTION effect is somewhat nonstandard. SAS introduced COLLECTION effects for variable selection routines such as the "group LASSO" as a way to specify that all variables in the collection should be included in the model, or all should be excluded. The variables enter or leave the model "as a unit."
Although most tables and statistics from PROC LOGISTIC are the same for the two models, there are differences. One difference is the "Type 3 Analysis of Effects," which tests whether all the parameters associated with an effect are zero. The first call to PROC LOGISTIC analyzes 14 effects; the second call analyzes three (collection) effects. You can use the EFFECT statement to create POLYNOMIAL effects, which behave like the usual "star and bar" effects.
In summary, the EFFECT statement provides a way to treat sets of variables "as a unit." This leads to a simple syntax for forming specific interaction effects. The example in this article shows how to create pairwise interactions, but the COLLECTION effects (and POLYNOMIAL effects) can also be used to specify higher-order interactions.
2 Comments
Fantastic! Can I use 'collection' syntax in proc mixed as well?
PROC MIXED does not support the EFFECT statement. However, PROC GLMMIX does. Also, you might want to consider using POLYNOMIAL effects.