I'm addicted to you.
You're a hard habit to break.
Such a hard habit to break.
— Chicago, "Hard Habit To Break"
I started thinking about superfluous statements in the SAS language. Some programmers might argue that if the program still runs correctly, then unnecessary statements are inconsequential. However, as a general rule I think it is a good programming practice to avoid writing unnecessary statements.
Here are a few example of unnecessary SAS statement. Can you think of more?
A RUN statement after a DATALINES statement
The doc for the DATALINES statement in the SAS DATA step states: "The DATALINES statement is the last statement in the DATA step. Use a null statement (a single semicolon) to indicate the end of the input data." In other words, you do not need a RUN statement after the semicolon to run the DATA step. The following example runs correctly and creates a data set:
data A; input x @@; datalines; 1 2 3 4 5 6 ; /* <== no RUN statement required */
How many times have you seen a RUN statement after a DATALINES statement? Countless! I've even seen examples in the SAS documentation that use this unnecessary statement.
A semicolon after a macro call
If you define a macro that contains a complete set of valid SAS statements, you do not need another semicolon when you call the macro. For example, the following example is valid:
%macro TOP(dsname); proc print data=&dsname(obs=5); run; %mend; %TOP(sashelp.class) /* <== no semicolon required */
It's not a big deal if you type the semicolon because a semicolon is the null statement. It has no performance implications. But for some reason it bothers me when I catch myself doing it.
A RUN statement in a fully interactive procedure
In a fully interactive procedure, statements are executed immediately. The RUN statement has no effect. Examples include PROC IML, PROC SQL, and PROC OPTMODEL. You use the QUIT statement to exit these procedures, which means that the RUN statement is never needed. The following program is correct and runs three statements. In interactive mode, each statement gets run when the SAS parser reaches the semicolon that ends the statement.
proc sql; create table Example (x num, y num); /* statement 1 */ insert into Example values(1, 2) values(3, 4) values(5, 6); /* statement 2 */ select * from Example; /* statement 3 */ /* no RUN statement required! */
A RUN statement in (some) procedures that support RUN-group processing
Some SAS procedures are partly interactive. Procedures such as PROC DATASETS, PROC REG, and PROC GLM support RUN-group processing. For these procedures, the RUN statement defines blocks of statements that get executed, but the procedure remains running until it encounters a QUIT statement.
Many SAS/STAT procedures interpret QUIT to mean "run the most recent statements and then quit." For these procedures, you do not need a RUN statement before you call QUIT. The following statements run a regression analysis and then quit the procedure:
proc glm data=sashelp.class; model weight = height; quit; /* <== No RUN statement; Runs previous statements, then quits */
Unfortunately, SAS procedures are not completely consistent in implementing the QUIT statement. In some SAS procedures the QUIT statement means "ignore the most recent statements and quit." The canonical examples are the traditional SAS/GRAPH procedures such as PROC GPLOT. In the following program, the first PLOT statement creates a scatter plot because it is followed by a RUN statement. However, the second plot statement is not followed by a RUN statement, so it is ignored and the second plot is not produced.
proc gplot data=sashelp.class; plot weight*height; run; /* <== executes previous PLOT statement; does not quit */ plot weight*age; quit; /* <== ignores previous PLOT statement, then quits */ /* use RUN; QUIT; to produce the second plot */
If you aren't sure how a procedure behaves with regards to RUN-group processing, it is always safe to use the RUN and QUIT statements in tandem.
When to include optional statements?
The previous sections describe unnecessary statements that I like to skip. However, sometimes I include optional statements in my programs for clarity, readability, or to practice defensive programming.
SAS supports many optional statements. When you omit an optional statement, the procedure does some default behavior. For example, if you omit the VAR statement, most procedures runs on all numerical variables (for example, PROC MEANS) or on all variables (for example, PROC PRINT). When I want the default behavior, I skip the VAR statement.
Another statement that is technically optional is the RUN statement for a sequence of procedures. Because the next call to a procedure or DATA step will always end the previous procedure, you can technically omit the RUN statement for all but the last procedure. This means that the following program is valid, although I do not recommend this style of programming:
data class; set sashelp.class(where=(sex='M')); /* 'class' is the _LAST_ data set */ proc means; /* DATA= _LAST_ */ proc print; /* DATA= _LAST_ */ run;
If I'm feeling lazy, I might write these statement during the early exploratory phase of a data analysis. However for serious work I terminate every procedure by using a RUN or QUIT statement. Skipping a RUN statement can lead to undesirable interactions with global statements such as the TITLE statement and ODS statements.
There is much more that can be said about these topics. What are your thoughts?
- Are there unnecessary statements that you write out of habit?
- Are there optional statements that you always include because it makes the program clearer?