Have you ever heard of the DOLIST syntax? You might know the syntax even if you are not familiar with the name. The DOLIST syntax is a way to specify a list of numerical values to an option in a SAS procedure. Applications include:
- Specify the end points for bins of a histogram
- Specify percentiles to be output to a data set
- Specify tick marks for a custom axis on a graph
- Specify the location of reference lines on a graph
- Specify a list of parameters for an algorithm. Examples include smoothing parameters (the SMOOTH= option in PROC LOESS), sample sizes (the NTOTAL= option in PROC POWER), and initial guess for parameters in an optimization (the PARMS statement in PROC NLMIXED and PROC NLIN)
This article demonstrates how to use the DOLIST syntax to specify a list of values in SAS procedures. It shows how to use a single statement to specify individual values and also a sequence of values.
The DOLIST syntax enables you to write a single statement that specifies individual values and one or more sequences of values. The DOLIST syntax should be in the toolbox of every intermediate-level SAS programmer!
The DOLIST syntax in the SAS DATA step
According to the documentation of PROC POWER, the syntax described in this article is sometimes called the DOLIST syntax because it is based on the syntax for the iterative DO loop in the DATA step.
The most common syntax for a DO loop is DO x = start TO stop BY increment. For example, DO x = 10 TO 90 BY 20; iterates over the sequence of values 10, 30, 50, 70, and 90. If the increment is 1, you can omit the BY increment portion of the statement. However, you can also specify values as a common-separated list, such as DO x = 10, 30, 50, 70, 90;, which generates the same values. What you might not know is that you can combine these two methods. For example, in the following DATA step, the values are specified by using two comma-separated lists and three sequences. For clarity, I have placed each list on a separate line, but that is not necessary:
/* the DOLIST syntax for a DO loop in the DATA step */ data A; do pctl = 5, /* individual value(s) */ 10 to 50 by 20, /* a sequence of values */ 54.3, 69.1, /* individual value(s) */ 80 to 90 by 5, /* another sequence */ 60 to 40 by -20; /* yet another sequence */ output; end; run; proc print; run; |
The output (not shown) is a list of values: 5, 10, 30, 50, 54.3, 69.1, 80, 85, 90, 60, 40. Notice that the values do not need to be in sorted order, although they often are.
The expressions to the right of the equal sign are what I mean by the "DOLIST syntax." You can use the same syntax to specify a list of options in many SAS procedures. When the SAS documentation says that an option takes a "list of values," you can often use a comma-separated list, a space-separated list, and the syntax start TO stop BY increment. (Or a combination of these expressions!) The following sections provide a few examples, but there are literally hundreds of options in SAS that support the DOLIST syntax!
Some procedures (for example, PROC SGPLOT) require the DOLIST values to be in parentheses. Consequently, I have adopted the convention of always using parentheses around DOLIST values, even if the parentheses are not strictly required. As far as I know, it is never wrong to put the DOLIST inside parentheses, and it keeps me from having to remember whether parentheses are required. The examples in this article all use parentheses to enclose DOLIST values.
Histogram bins and percentiles
You can use the DOLIST syntax to specify the endpoints of bins in a histogram. For example, in PROC UNIVARIATE, the ENDPOINTS= option in the HISTOGRAM statement supports a DOLIST. Because histograms use evenly spaced bins, usually you will specify only one sequence, as follows:
proc univariate data=sashelp.cars; var weight; histogram weight / endpoints=(1800 to 7200 by 600); /* DOLIST sequence expression */ run; |
You can also use the DOLIST syntax to specify percentiles. For example, the PCTLPTS= option on the OUTPUT statement enables you to specify which percentiles of the data should be written to a data set:
proc univariate data=sashelp.cars; var MPG_City; output out=UniOut pctlpre=P_ pctlpts=(50 75, 95 to 100 by 2.5); /* DOLIST */ run; |
Notice that this example specifies both individual percentiles (50 and 75) and a sequence of percentiles (95, 97.5, 100).
Tick marks and reference lines
The SGPLOT procedure enables you to specify the locations of tick marks on the axis of a graph. Most of the time you will specify an evenly spaced set of values, but (just for fun) the following example shows how you can use the DOLIST syntax to combine evenly spaced values and a few custom values:
title "Specify Ticks on the Y Axis"; proc sgplot data=sashelp.cars; scatter x=Weight y=Mpg_City; yaxis grid values=(10 to 40 by 5, 50 60); /* DOLIST; commas optional */ run; |
As shown in the previous example, the GRID option on the XAXIS and YAXIS statements enables you to display reference lines at each tick location. However, sometimes you want to display reference lines independently from the tick marks. In that case, you can use the REFLINE statement, as follows:
title "Many Reference Lines"; proc sgplot data=sashelp.cars; scatter x=Weight y=MPG_City; refline (1800 to 6000 by 600, 7000) / axis=x; /* many reference lines */ run; |
Statistical procedures
Many statistical procedures have options that support lists. In most cases, you can use the DOLIST syntax to provide values for the list.
I have already written about how to use the DOLIST syntax to specify initial guesses for the PARM statement in PROC NLMIXED and PROC NLIN. The documentation for the POWER procedure discusses how to specify lists of values and uses the term "DOLIST" in its discussion.
Some statistical procedures enable you to specify multiple parameter values, and the analysis is repeated for each parameter in the list. One example is the SMOOTH= option in the MODEL statement of the LOESS procedure. The SMOOTH= option specifies values of the loess smoothing parameter. The following call to PROC LOESS fits four loess smoothers to the data. The call to PROC SGPLOT overlays the smoothers on a scatter plot of the data:
title "Multiple Parameters in PROC LOESS"; proc loess data=sashelp.cars plots=none; model MPG_City = Weight / smooth=(0.1 to 0.5 by 0.2, 0.75); /* value-list */ output out=LoessOut P=Pred; run; proc sort data=LoessOut; by SmoothingParameter Weight; run; proc sgplot data=LoessOut; scatter x=Weight y=MPG_City / transparency=0.9; series x=Weight y=Pred / group=SmoothingParameter curvelabel curvelabelpos=min; run; |
Summary
In summary, this article describes the DOLIST syntax in SAS, which enables you to simultaneously specify individual values and evenly spaced sequences of values. A sequence is specified by using the start TO step BY increment syntax. The DOLIST syntax is valid in many SAS procedures and in the DATA step. In some procedures (such as PROC SGPLOT), the syntax needs to be inside parentheses. For readability, you can use commas to separate individual values and sequences.
Many SAS procedures accept the special syntax even if it is not explicitly mentioned in the documentation. In the documentation for an option, look for terms such as value-list or numlist or value-1 <...value-n>, which indicate that the option supports the DOLIST syntax.
7 Comments
Whoa! I've been using (and loving) DOLIST syntax for years in DATA step DO loops, totally unaware of the power and flexibility available. This VERY useful post proves once again that you CAN teach an old dog (or at least an old SAS programmer) new tricks :-) Thanks, Rick!
I've been using them for quite some time in graphic and stat procs but didn't know they have a name :-) Thanks Rick!
How many reports support DO list syntax with uneven BY-groups for duration intervals like year, month or quarter?
Or even strings for DISCRETE groupings?
Thanks for this insight. I checked in the documentation (SAS 9.4 M5) for the data step, do iterative. Indeed, by careful reading it is possible to deduce this syntax but it is not easy. May I suggest improving the documentation of this. Describe it formally, give it a name, as you have done, and refer to it from elsewhere in the documentation. It belongs as a component in "Language reference: concepts > SAS system concepts". Look at any documentation for a SQL database product (not SAS) to see what I mean.
The data step documentation mentions in an informal way the use of WHILE and UNTIL with iterative loops. Do these work in the contexts you describe? The documentation should be revised to make this unambiguously clear.
SAS is particularly in need of clear documentation as it does not have a small well-defined language grammar. Without good documentation for text-based languages all we can do is guess or experiment - or wait for you to blog about it.
Hi Peter,
I am a technical writer at SAS and will be updating the documentation for the SAS Statements doc that you mention. I have been working on getting it updated based on your feedback. I was already in the process of adding a new chapter to the document you mentioned called "Conditionals and Loops", so it was perfect timing that you mentioned this.
You mentioned to "Look at any documentation for a SQL database product (not SAS) to see what I mean." I want to make sure that I have all the info I need ... :-) Can you give me a link/example? Just so I can compare what I have so far?
Thank you so much for your feedback!
Lisa
Thank you for sharing this very useful feature, Rick!
Unfortunately, the last time I checked, GTL does not follow any of the customary rules.