Visualize the 68-95-99.7 rule in SAS

8
Illustration of the 68-95-99.7 rule

A reader commented on last week's article about constructing symmetric intervals. He wanted to know if I created it in SAS.

Yes, the graph, which illustrates the so-called 68-95-99.7 rule for the normal distribution, was created by using several statements in the SGPLOT procedure in Base SAS

  • The SERIES statement creates the bell-shaped curve.
  • The BAND statement creates the shaded region under the curve.
  • The DROPLINE statement creates the vertical lines from the curve to the X axis.
  • The HIGHLOW statement creates the horizontal lines that indicate interval widths.
  • The TEXT statement creates the text labels "68%", "95%", and "99.7%".

If you follow some simple rules, it is easy to use PROC SGPLOT the overlay multiple curves and lines on a graph. The key is to organize the underlying data into a block form, as shown conceptually in the plot to the right. I use different variable names for each component of the plot, and I set the values of the other variables to missing when they are no longer relevant. Each statement uses only the variables that are relevant for that overlay.

The following DATA step creates the data for the 68-95-99.7 graph. Can you match up each section with the SGPLOT statements that overlay various components to create the final image?

data NormalPDF;
mu = 50; sigma = 8;     /* parameters for the normal distribution N(mu, sigma) */
/* 1. Data for the SERIES and BAND statements */
do m = -4 to 4 by 0.05;                /* x in [mu-4*sigma, mu+4*sigma] */
   x = mu + m*sigma;
   f = pdf("Normal", x, mu, sigma);    /* height of normal curve */
   output;
end;
x=.; f=.;
 
/* 2. Data for vertical lines at mu + m *sigma, m=-3, -2, -1, 1, 2, 3 */
do m =-3 to 3;
   if m=0 then continue;               /* skip m=0 */
   Lx = mu + m*sigma;                  /* horiz location of segment */
   Lf = pdf("Normal", Lx, mu, sigma);  /* vertical height of segment */
   output;
end;
LX = .; Lf = .;
 
/* 3. Data for horizontal lines. Heights are 1.1, 1.2, and  1.3 times the max height of the curve */
Tx = mu;                               /* text centered at mu */
fMax = pdf("Normal", mu, mu, sigma);   /* highest point of curve */
Text = "68%  ";                        /* 68% interval */
TL = mu - sigma;  TR = mu + sigma;     /* Left/Right endpoints of interval */
Ty = 1.1 * fMax;                       /* height of label and segment */
output;
 
Text = "95%  ";                        /* 95% interval */
TL = mu - 2*sigma;  TR = mu + 2*sigma; /* Left/Right endpoints of interval */
Ty = 1.2 * fMax;                       /* height of label and segment */
output;
 
Text = "99.7%";                        /* 99.7% interval */
TL = mu - 3*sigma;  TR = mu + 3*sigma; /* Left/Right endpoints of interval */
Ty = 1.3 * fMax;                       /* height of label and segment */
output;
 
keep x f Lx Lf Tx Ty TL TR Text;
run;
 
proc sgplot data=NormalPDF noautolegend;
   band     x=x upper=f lower=0;
   series   x=x y=f             / lineattrs=(color=black);
   dropline x=Lx y=Lf           / dropto=x lineattrs=(color=black);
   highlow  y=Ty low=TL high=TR / lowcap=serif highcap=serif lineattrs=(thickness=2);
   text     x=Tx y=Ty text=Text / backfill fillattrs=(color=white) textattrs=(size=14); 
   yaxis offsetmin=0 min=0 label="Density";
   xaxis values=(20 to 80 by 10) display=(nolabel);
run;
Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

8 Comments

  1. This looks like a great graphic, which I would like to modify to illustrate how to calculate one and two sided p-values. Unfortunately, neither "dropline" nor "text" statements work for me in 9.4 (TS1M0). Can you suggest a work-around?

  2. Pingback: Extreme values: What is an extreme value for normally distributed data? - The DO Loop

  3. Pingback: The math you learned in school: Yes, it’s useful! - The DO Loop

Leave A Reply

Back to Top