SAS author's tip: Check data for errors with summary statistics

This week's SAS tip is from Sandra Schlotzhauer and her book Elementary Statistics Using SAS. Besides being a long-time SAS user, Sandra has extensive experience teaching basic statistics to non-statisticians. Her accessible style resonates with SAS users, including these reviewers of the book. 

The following excerpt is from SAS Press author Sandra Schlotzhauer and her book "Elementary Statistics Using SAS" Copyright © 2009, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software).

Using Summary Statistics to Check Data for Errors

As a final task of summarizing continuous variables, think about checking the data for errors. The scatter plot matrix is an effective way to find potential outlier points. Chapter 4 discussed using box plots to find outlier points, and it recommended checking the minimum and maximum values to confirm that they seem reasonable. You could use PROC UNIVARIATE to produce a report of summary statistics for each variable, but using PROC CORR can be more efficient:

ods select SimpleStats;
proc corr data=kilowatt;
var ac dryer kwh;
title 'Summary Statistics for KILOWATT Data Set';

The ODS statement specifies that PROC CORR print only the simple summary statistics table. The VAR statement lists all of the variables that you want to summarize.

Figure 10.5 shows the results.Summary Statistics from PROC CORR

SAS identifies the variables under the Variable heading in the Simple Statistics table. See Chapter 4 for definitions of the other statistics in the other columns.

SAS uses all available values to calculate these statistics. Suppose the homeowner forgot to collect dryer information on one day, resulting in only 20 values for the dryer variable. The Simple Statistics table would show N as 20 for dryer, and N as 21 for the other two variables.

Visit Sandra Schlotzhauer's author page for bonus content, including a free chapter from her book. And take a look at previously featured tips from Elementary Statistics Using SAS on this blog: Using the NOSTAT option in charts and Understanding the WHERE statement.


tags: elementary statistics using sas, PROC CORR, sandra schlotzhauer, sas author's tip, sas users, summary statistics