SAS regression procedures support several parameterizations of classification variables. When a categorical variable is used as an explanatory variable in a regression model, the procedure generates dummy variables that are used to construct a design matrix for the model. The process of forming columns in a design matrix is called a *parameterization* or *encoding*. In SAS, most regression procedures use either the GLM encoding, the EFFECT encoding, or the REFERENCE encoding. This article summarizes the default and optional encodings for each regression procedure in SAS/STAT. In many SAS procedures, you can use the PARAM= option to change the default encoding.

The documentation section "Parameterization of Model Effects" provides
a complete list of the encodings in SAS
and shows how the design matrices are constructed from the levels. (The *levels* are the values of a classification variable.)
Pasta (2005) gives examples and further discussion.

### Default and optional encodings for SAS regression procedures

The following SAS regression procedures support the CLASS statement or a similar syntax. The columns GLM, REFERENCE, and EFFECT indicate the three most common encodings. The word "Default" indicates the default encoding. For procedures that support the PARAM= option, the column indicates the supported encodings. The word *All* means that the procedure supports the complete list of SAS encodings. Most procedures default to using the GLM encoding; the exceptions are highlighted.

Procedure |
GLM | REFERENCE |
EFFECT | PARAM= |

ADAPTIVEREG |
Default | |||

ANOVA |
Default | |||

BGLIMM |
Default | Yes | Yes | GLM | EFFECT | REF |

CATMOD |
Default | |||

FMM |
Default | |||

GAM |
Default | |||

GAMPL |
Default | Yes | GLM | REF | |

GEE |
Default | |||

GENMOD |
Default | Yes | Yes | All |

GLIMMIX |
Default | |||

GLM |
Default | |||

GLMSELECT |
Default | Yes | Yes | All |

HP regression procedures |
Default | Yes | GLM | REF | |

HPMIXED |
Default | |||

ICPHREG |
Default | Yes | Yes | All |

LIFEREG |
Default | |||

LOGISTIC |
Yes | Yes | Default | All |

MIXED |
Default | |||

ORTHOREG |
Default | Yes | Yes | All |

PLS |
Default | |||

PROBIT |
Default | |||

PHREG |
Yes | Default | Yes | All |

QUANTLIFE |
Default | |||

QUANTREG |
Default | |||

QUANTSELECT |
Default | Yes | Yes | All |

RMTSREG |
Default | Yes | Yes | All |

ROBUSTREG |
Default | |||

SURVEYLOGISTIC |
Yes | Yes | Default | All |

SURVEYPHREG |
Default | Yes | Yes | All |

SURVEYREG |
Default | |||

TRANSREG |
Yes | Default | Yes |

A few comments:

- The REFERENCE encoding is the default for PHREG and TRANSREG.
- The EFFECT encoding is the default for CATMOD, LOGISTIC, and SURVEYLOGISTIC.
- The HP regression procedures all use the GLM encoding by default and support only PARAM=GLM or PARAM=REF. The HP regression procedures include HPFMM, HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPPLS, HPQUANTSELECT, and HPREG. In spite of its name, GAMPL is also an HP procedure. In spite of its name, HPMIXED is NOT an HP procedure!
- PROC LOGISTIC and PROC HPLOGISTIC use different default encodings.
- CATMOD does not have a CLASS statement because all variables are assumed to be categorical.
- PROC TRANSREG does not support a CLASS statement. Instead, it uses a CLASS() transformation list. It uses different syntax to support parameter encodings.

### How to interpret main effects for the SAS encodings

The GLM parameterization is a singular parameterization. The other encodings are nonsingular. The "Other Parameterizations" section of the documentation gives a simple one-sentence summary of how to interpret the parameter estimates for the main effects in each encoding:

- The GLM encoding estimates the difference in the effect of each level compared to the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. The design matrix for the GLM encoding is singular.
- The REFERENCE encoding estimates the difference in the effect of each nonreference level compared to the effect of the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. Notice that the REFERENCE encoding gives the same interpretation as the GLM encoding. The difference is that the design matrix for the REFERENCE encoding excludes the column for the reference level, so the design matrix for the REFERENCE encoding is (usually) nonsingular.
- The EFFECT encoding estimates the difference in the effect of each nonreference level compared to the average effect over all levels.

This article lists the various encodings that are supported for each SAS regression procedures. I hope you will find it to be a useful reference. If I've missed your favorite regression procedure, let me know in the comments.

## 1 Comment

Pingback: The best way to generate dummy variables in SAS - The DO Loop