Construct polynomial effects in SAS regression models

1

If you use SAS regression procedures, you are probably familiar with the "stars and bars" notation, which enables you to construct interaction effects in regression models. Although you can construct many regression models by using that classical notation, a friend recently reminded me that the EFFECT statement in SAS provides greater control over the interaction terms in a regression model.

The EFFECT statement is supported in many SAS procedures, including the GLIMMIX, GLMSELECT, and LOGISTIC procedures. Because those procedures can output a design matrix, you can use a model that is generated by an EFFECT statement in any SAS procedure, even older procedures that do not support the EFFECT statement. I have previously shown how you can use the SPLINE option in the EFFECT statement to generate spline effects. This article deals with polynomial effects, which are effects that are formed by elementwise multiplication of continuous variables, such as x1*x1, x1*x2, and x1*x1*x2*x3.

The following statements rename a few continuous variables in the Sashelp.Heart data set, which will be used in the examples. The new variables (x1, x2, x3, and x4) are easier to type and using these short variable names makes it easier to see how interactions effects are formed.

data Heart / view=Heart;
set Sashelp.Heart;
rename Height=x1  Weight=x2  Diastolic=x3  Systolic=x4;
run;

A review of "stars and bars" notation in SAS

Recall that many SAS regression procedures (such as GLM, GENMOD, LOGISTIC, MIXED,...) support the bar operator (|) to specify interactions between effects. For example, the following MODEL statement specifies that the model should include all main effects and all higher-order interactions:

proc logistic data=Heart;
   model Y = x1 | x2 | x3 | x4;   /* all main effects and interactions up to 4-way */
run;

The previous MODEL statement includes all two-way, three-way, and four-way interaction effects between distinct variables. In practice, fitting a model with so many effects will lead to overfitting, so most analysts restrict the model to two-way interactions. In SAS you can use the "at" operator (@) to specify the highest interaction terms in the model. For example, the following syntax specifies that the model contains only main effects and two-way interactions:

model Y = x1 | x2 | x3 | x4 @2;   /* main effects and two-way interactions */
/* equivalent: model Y = x1 x2 x3 x4   x1*x2 x1*x3 x1*x4   x2*x3 x2*x4   x3*x4; */

Use the EFFECT statement to build polynomial effects

Notice that the bar operator does not generate the interaction of a variable with itself. For example, the terms x1*x1 and x2*x2 are not generated. Notice also that you need to explicitly type out each variable when you use the bar operator. You cannot use "colon notation" (x:) or hyphens (x1-x4) to specify a range of variables. For four variable, this is an inconvenience; for hundreds of variables, this is a serious problem.

The EFFECT statement enables you to create polynomial effects, which have the following advantages:

  • You can use colon notation or a hyphen to specify a range of variables. (You can also specify a space-separated list of variable names.)
  • You can control the degree of the polynomial terms and the maximum value of the exponent that appears in each term.
  • You can generate interactions between a variable and itself.

The POLYNOMIAL option in the EFFECT statement is described in terms of multivariate polynomials. Recall that a multivariate monomial is a product of powers of variables that have nonnegative integer exponents. For example, x2 y z is a monomial that contains three variables. The degree of a monomial is the sum of the exponents of the variables. The previous monomial has degree 4 because 4 = 2 + 1 + 1.

Use the EFFECT statement to generate two-way interactions

The syntax for the EFFECT statement is simple. For example, to generate all main effects and two-way interactions, including second-degree terms like x1*x1 and x2*x2, use the following syntax:

ods select ParameterEstimates(persist);
proc logistic data=Heart;
   effect poly2 = polynomial(x1-x4 / degree=2);
   model Status = poly2;
/* equivalent:  model Status = x1 | x2 | x3 | x4  @2     
                               x1*x1 x2*x2 x3*x3 x4*x4;  */
run;

The name of the effect is 'poly2'. It is a polynomial effect that contains all terms that involve first- and second-degree monomials. Thus it contains the main effects, the two-way interactions between variables, and the terms x1*x1, x2*x2, x3*x3, and x4*x4. The equivalent model in "stars and bars" notation is shown in the comment. The models are the same, although the variables are listed in a different order.

You can also use the colon operator to select variables that have a common prefix. For example, to specify all polynomial effects for variables that begin with the prefix 'x', you can use EFFECT poly2 = POLYNOMIAL(x: / DEGREE=2).

You can use the MDEGREE= option to control the maximum degree of any exponent in a monomial. For example, the monomials x1*x1 and x1*x2 are both second-degree monomials, but the maximum exponent that appears in the first monomial is 2 whereas the maximum exponent that appears in the second monomial is 1. The following syntax generates only monomial terms for which the maximum exponent is 1, which means that x1*x1 and x2*x2 will not appear:

proc logistic data=Heart;
   effect poly21 = polynomial(x1-x4 / degree=2 mdegree=1);  /* exclude x1*x1, x2*x2, etc. */
   model Status = poly21;
/* equivalent:  model Status = x1 | x2 | x3 | x4  @2    */
run;

Use the EFFECT statement to generate two-way interactions between lists of variables

A novel use of polynomial effects is to generate all two-way interactions between variables in one list and variables in another list. For example, suppose you are interested in the interactions between the lists (x1 x2) and (x3 x4), but you are not interested in within-list interactions such as x1*x2 and x3*x4. By using polynomial effects, you can create the two lists and then use the bar operator to request all main and two-way interactions between elements of the lists, as follows:

proc logistic data=Heart;
   effect list1 = polynomial(x1-x2);    /* first list of variables */
   effect list2 = polynomial(x3-x4);    /* second list of variables */
   model Status = list1 | list2;        /* main effects and pairwise interactions between lists */
/* equivalent:
   model Status = x1 x2                    
                  x3 x4                
                  x1*x3 x1*x4 x2*x3 x2*x4; */
run;

Notice that you can use the EFFECT statement multiple times in the same procedure. This is a powerful technique! By using multiple EFFECT statements, you can name sets of variables that have similar properties, such as demographic effects (age, sex), lifestyle effects (diet, exercise), and physiological effects (blood pressure, cholesterol). You can then easily construct models that contain interactions between these sets, such as LIFESTYLE | PHYSIOLOGY.

In summary, the POLYNOMIAL option in the EFFECT statement enables you to control the interactions between variables in a model. You can form models that are equivalent to the "stars and bars" syntax or specify more complex models. In particular, you can use the EFFECT statement multiple times to construct lists of variables and interactions between elements of the lists. For more information about polynomial effects, see the SAS documentation for the POLYNOMIAL option in the EFFECT STATEMENT.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of PROC IML and SAS/IML Studio. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Back to Top