Encodings of CLASS variables in SAS regression procedures: A cheat sheet

SAS regression procedures support several parameterizations of classification variables. When a categorical variable is used as an explanatory variable in a regression model, the procedure generates dummy variables that are used to construct a design matrix for the model. The process of forming columns in a design matrix is called a parameterization or encoding. In SAS, most regression procedures use either the GLM encoding, the EFFECT encoding, or the REFERENCE encoding. This article summarizes the default and optional encodings for each regression procedure in SAS/STAT. In many SAS procedures, you can use the PARAM= option to change the default encoding.

The documentation section "Parameterization of Model Effects" provides a complete list of the encodings in SAS and shows how the design matrices are constructed from the levels. (The levels are the values of a classification variable.) Pasta (2005) gives examples and further discussion.

Default and optional encodings for SAS regression procedures

The following SAS regression procedures support the CLASS statement or a similar syntax. The columns GLM, REFERENCE, and EFFECT indicate the three most common encodings. The word "Default" indicates the default encoding. For procedures that support the PARAM= option, the column indicates the supported encodings. The word All means that the procedure supports the complete list of SAS encodings. Most procedures default to using the GLM encoding; the exceptions are highlighted.

Procedure	GLM	REFERENCE	EFFECT	PARAM=
ADAPTIVEREG	Default
ANOVA	Default
BGLIMM	Default	Yes	Yes	GLM \| EFFECT \| REF
CATMOD			Default
FMM	Default
GAM	Default
GAMPL	Default	Yes		GLM \| REF
GEE	Default
GENMOD	Default	Yes	Yes	All
GLIMMIX	Default
GLM	Default
GLMSELECT	Default	Yes	Yes	All
HP regression procedures	Default	Yes		GLM \| REF
HPMIXED	Default
ICPHREG	Default	Yes	Yes	All
LIFEREG	Default
LOGISTIC	Yes	Yes	Default	All
MIXED	Default
ORTHOREG	Default	Yes	Yes	All
PLS	Default
PROBIT	Default
PHREG	Yes	Default	Yes	All
QUANTLIFE	Default
QUANTREG	Default
QUANTSELECT	Default	Yes	Yes	All
RMTSREG	Default	Yes	Yes	All
ROBUSTREG	Default
SURVEYLOGISTIC	Yes	Yes	Default	All
SURVEYPHREG	Default	Yes	Yes	All
SURVEYREG	Default
TRANSREG	Yes	Default	Yes

A few comments:

The REFERENCE encoding is the default for PHREG and TRANSREG.
The EFFECT encoding is the default for CATMOD, LOGISTIC, and SURVEYLOGISTIC.
The HP regression procedures all use the GLM encoding by default and support only PARAM=GLM or PARAM=REF. The HP regression procedures include HPFMM, HPGENSELECT, HPLMIXED, HPLOGISTIC, HPNLMOD, HPPLS, HPQUANTSELECT, and HPREG. In spite of its name, GAMPL is also an HP procedure. In spite of its name, HPMIXED is NOT an HP procedure!
PROC LOGISTIC and PROC HPLOGISTIC use different default encodings.
CATMOD does not have a CLASS statement because all variables are assumed to be categorical.
PROC TRANSREG does not support a CLASS statement. Instead, it uses a CLASS() transformation list. It uses different syntax to support parameter encodings.

How to interpret main effects for the SAS encodings

The GLM parameterization is a singular parameterization. The other encodings are nonsingular. The "Other Parameterizations" section of the documentation gives a simple one-sentence summary of how to interpret the parameter estimates for the main effects in each encoding:

The GLM encoding estimates the difference in the effect of each level compared to the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. The design matrix for the GLM encoding is singular.
The REFERENCE encoding estimates the difference in the effect of each nonreference level compared to the effect of the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. Notice that the REFERENCE encoding gives the same interpretation as the GLM encoding. The difference is that the design matrix for the REFERENCE encoding excludes the column for the reference level, so the design matrix for the REFERENCE encoding is (usually) nonsingular.
The EFFECT encoding estimates the difference in the effect of each nonreference level compared to the average effect over all levels.

This article lists the various encodings that are supported for each SAS regression procedures. I hope you will find it to be a useful reference. If I've missed your favorite regression procedure, let me know in the comments.

Blogs

Blogs

Encodings of CLASS variables in SAS regression procedures: A cheat sheet

Default and optional encodings for SAS regression procedures

How to interpret main effects for the SAS encodings

About Author

1 Comment

Leave A Reply Cancel Reply

Follow Us

What is...