A previous article about how to display missing values in SAS prompted a comment about special missing values in ODS tables in SAS. Did you know that statistical tables in SAS include special missing values to represent certain situations in statistical analyses? This article explains how to interpret four special missing values that appear in SAS statistical tables.
Special missing values in SAS
SAS supports one character missing value, which is the blank string. However, SAS supports 28 numerical missing values that you can use as the value of a numerical variable:
- The "ordinary" (and most common) missing value is represented by a dot (.). In data, it often indicates a numerical value that was not collected. For example, in survey data, a missing value can be used to indicate that a participant did not answer a question. By default, the ordinary missing value is displayed as a dot, but you can use the techniques from the previous article to display a different symbol.
- There are 26 "special" missing values that are formed by using a dot followed by a letter in the English alphabet: .A, .B, ..., .Z. For example, for participants in a weight-loss program that collects weight every week, you can use .A to indicate that a participant was absent for a scheduled meeting, use .R to indicate that the participant refused to be weighed, and use .X to indicate that the scale was broken. These values are displayed as the capital letter used to represent the missing value (for example, 'A', 'R', and 'X').
- The special missing value that uses a dot followed by an underscore (._). This missing value is displayed as an underscore in numerical tables.
The following DATA step reads in several missing and non-missing values. You can use the MISSING statement in the SAS DATA step to read numerical variables that contain special missing values. You might want to detect missing values in DATA step programming statements. There are two ways to detect missing values: by using the logical comparison operators (chiefly, EQ, NE) or by using the MISSING function, as follows:
/* read and detect missing values */ data Miss; missing A R X _; input Name $ weight; isEqualDot = (weight=.); /* is it the ordinary missing value? */ isMissing = missing(weight); /* is it any missing value? */ label weight="Weight (kg)"; datalines; Alison 54 Bob 100 Camille . Devra 82 Elija A Fred 78 Grace R Hassan 90 Jeremy _ Karena X ; proc print data=Miss; run; |
The DATA step uses the MISSING statement to define valid missing values. If you do not use the MISSING statement, the log will display NOTE: Invalid data for weight and an ordinary missing value (.) will be used instead of a special missing value. The call to PROC PRINT shows how missing values are displayed by default. An ordinary missing value is displayed as a dot; special missing values are displayed by using an upper-case character or an underscore.
Notice what happens if you compare a missing value to a dot by using a logical statement such as IF value=. THEN .... In the DATA step, the logical condition is true only if the value is an ordinary missing value. If you want to detect a specific special missing values, you can use logic such as IF value=.A THEN .... To detect ANY missing value, use the MISSING function, as in IF missing(value) THEN ....
Special missing values in statistical tables
The statistical tables in SAS use four types of missing values:
- The ordinary missing value (.) indicates that a statistic could not be computed. This can occur for mathematical reasons (such as division by zero), for statistical reasons (the standard deviation of a single observation is undefined), or for computational reasons (an algorithm did not converge). In statistical tables, the ordinary missing value is rendered as a dot, by default.
- The special missing value ._ indicates a cell in a table that is intentionally empty because of the structure of the table. For example, the lower-right cells of an ANOVA table are traditionally left blank. In statistical tables, the value ._ is rendered as a blank (' ').
- The special missing value .I indicates positive infinity. In statistical tables, it is rendered as 'Infty'. For example, an upper one-sided confidence interval might be displayed as [27, Infty].
- The special missing value .M indicates negative infinity. In statistical tables, it is rendered as '-Infty'. For example, a lower one-sided confidence interval might be displayed as [-Infty, 27].
The following example shows that the blanks in an ANOVA table are represented by using the special "underscore" missing value. You can use the ODS OUTPUT statement to save the table to a SAS data set, then use PROC PRINT to show the underlying structure, as follows:
proc glm data=sashelp.class plots=none; class Sex; model Height = Sex; ods select OverallANOVA; ods output OverallANOVA=ANOVA; quit; proc print data=ANOVA; run; |
The output from PROC PRINT shows that the lower-right corner of the ANOVA table contains the underscore missing value.
In a similar way, you can request a one-sided confidence interval for a t-test by calling PROC TTEST:
proc ttest data=sashelp.class h0=60 sides=L plots=none; var Height; ods select ConfLimits; ods output ConfLimits=TTest; run; proc print data=TTest; run; |
The output from PROC PRINT shows that the value for the LowerCLMean variable is the special missing value, .M.
Why should you care?
You need to know about these missing values if you use the ODS OUTPUT statement to save an ODS table and then use the DATA step to process the value. The special missing values have special meaning: plus infinity, minus infinity, or "structurally empty" cell. As shown in the first section, you can use the MISSING function to detect any missing value. But you can also test for special missing values by using syntax such as IF value=._ THEN ....
That's cool! Can I format missing values like that?
You might wonder how the ODS tables know to display the special missing values in a special way. When you use PROC TEMPLATE to define a table, you can use the TRANSLATE statement (sometimes called the TRANSLATE INTO statement) to format cells in a table. If you want to create your own tables that mimic the behavior of the SAS statistical tables, you can use the following TRANSLATE statements when you define the column template for a table:
translate _val_=._ into "", _val_=.I into " Infty", _val_=.M into "-Infty"; |
For an example of a template that uses these statements, use PROC TEMPLATE to display the source code that defines the ConfLimits table in PROC TTEST:
proc template; source Stat.TTest.ConfLimits; run; |
The table definition appears in the SAS log. Near the top of the definition, you will find the TRANSLATE statement that handles the display of special values.
Summary
SAS supports 28 different missing values. You can use them to represent different reasons that a data value is missing. In ODS statistical table, SAS uses the ordinary missing value and three special missing values: ._, .M. and .I. These special missing values are displayed in tables as a blank (" "), as "Infty", or as "-Infty", respectively.