PROC UNIVARIATE has provided confidence intervals for standard percentiles (quartiles) for eons. However, in SAS 9.3M2 (featuring the 12.1 analytical procedures) you can use a new feature in PROC UNIVARIATE to compute confidence intervals for a specified list of percentiles.
To be clear, percentiles and quantiles are essentially the same thing. For example, the median value of a set of data is the 0.5 quantile, which is also the 50th percentile. In general, the pth quantile is the (100 p)th percentile.
The CIPCTLDF option on the PROC UNIVARIATE statement produces distribution-free confidence intervals for the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles as shown in the following example:
/* CI for standard percentiles: 1, 5, 10, 25, 50, 75, 90, 95, 99 */ ods select Quantiles; proc univariate data=Sashelp.Cars cipctldf; var MPG_City; run;
However, prior to the 12.1 releaase of the analytics procedures, there was not an easy way to obtain confidence intervals for arbitrary percentiles. (Recall that you can specify by nonstandard percentiles by using the PCTLPTS= option on the OUTPUT statement.)
I am happy to report that the OUTPUT statement in the UNIVARIATE procedure now supports the CIPCTLDF= option, which you can use as follows:
proc univariate data=sashelp.cars noprint; var MPG_City; output out=pctl pctlpts=2.5 20 80 97.5 pctlpre=p cipctldf=(lowerpre=LCL upperpre=UCL); /* 12.1 options (SAS 9.3m2) */ run; proc print noobs; run;
The CIPCTLDF= option computes distribution-free confidence intervals for the percentiles that are specified on the PCTLPTS= option. The LOWERPRE= option specifies the prefix to use for lower confidence limits; the UPPERPRE= option specifies the prefix to use for upper confidence limits.
If your data are normally distributed, you can use the CIPCTLNORMAL= option on the OUTPUT statement to compute confidence limits. However, if your data are not normally distributed, the CIPCTLNORMAL= option might produce inaccurate results. For example, on the MPG_City data, which is highly skewed, the confidence intervals for large percentiles (like the 99th percentile) do not contain the corresponding point estimate. For this reason, I prefer the distribution-free intervals for most analyses.