I'm addicted to you.
You're a hard habit to break.
Such a hard habit to break.
— Chicago, "Hard Habit To Break"
Habits are hard to break. For more than 20 years I've been putting semicolons at the end of programming statements in SAS, C/C++, and Java/Javascript. But lately I've been working in a computer language that does not require semicolons. Nevertheless, my fingers have a mind of their own, and I catch myself typing unnecessary semicolons out of habit.
I started thinking about superfluous statements in the SAS language. Some programmers might argue that if the program still runs correctly, then unnecessary statements are inconsequential. However, as a general rule I think it is a good programming practice to avoid writing unnecessary statements.
Here are a few example of unnecessary SAS statement. Can you think of more?
A RUN statement after a DATALINES statement
The doc for the DATALINES statement in the SAS DATA step states: "The DATALINES statement is the last statement in the DATA step. Use a null statement (a single semicolon) to indicate the end of the input data." In other words, you do not need a RUN statement after the semicolon to run the DATA step. The following example runs correctly and creates a data set:
data A; input x @@; datalines; 1 2 3 4 5 6 ; /* <== no RUN statement required */ |
How many times have you seen a RUN statement after a DATALINES statement? Countless! I've even seen examples in the SAS documentation that use this unnecessary statement.
A semicolon after a macro call
If you define a macro that contains a complete set of valid SAS statements, you do not need another semicolon when you call the macro. For example, the following example is valid:
%macro TOP(dsname); proc print data=&dsname(obs=5); run; %mend; %TOP(sashelp.class) /* <== no semicolon required */ |
It's not a big deal if you type the semicolon because a semicolon is the null statement. It has no performance implications. But for some reason it bothers me when I catch myself doing it.
A RUN statement in a fully interactive procedure
In a fully interactive procedure, statements are executed immediately. The RUN statement has no effect. Examples include PROC IML, PROC SQL, and PROC OPTMODEL. You use the QUIT statement to exit these procedures, which means that the RUN statement is never needed. The following program is correct and runs three statements. In interactive mode, each statement gets run when the SAS parser reaches the semicolon that ends the statement.
proc sql; create table Example (x num, y num); /* statement 1 */ insert into Example values(1, 2) values(3, 4) values(5, 6); /* statement 2 */ select * from Example; /* statement 3 */ /* no RUN statement required! */ |
A RUN statement in (some) procedures that support RUN-group processing
Some SAS procedures are partly interactive. Procedures such as PROC DATASETS, PROC REG, and PROC GLM support RUN-group processing. For these procedures, the RUN statement defines blocks of statements that get executed, but the procedure remains running until it encounters a QUIT statement.
Many SAS/STAT procedures interpret QUIT to mean "run the most recent statements and then quit." For these procedures, you do not need a RUN statement before you call QUIT. The following statements run a regression analysis and then quit the procedure:
proc glm data=sashelp.class; model weight = height; quit; /* <== No RUN statement; Runs previous statements, then quits */ |
Unfortunately, SAS procedures are not completely consistent in implementing the QUIT statement. In some SAS procedures the QUIT statement means "ignore the most recent statements and quit." The canonical examples are the traditional SAS/GRAPH procedures such as PROC GPLOT. In the following program, the first PLOT statement creates a scatter plot because it is followed by a RUN statement. However, the second plot statement is not followed by a RUN statement, so it is ignored and the second plot is not produced.
proc gplot data=sashelp.class; plot weight*height; run; /* <== executes previous PLOT statement; does not quit */ plot weight*age; quit; /* <== ignores previous PLOT statement, then quits */ /* use RUN; QUIT; to produce the second plot */ |
If you aren't sure how a procedure behaves with regards to RUN-group processing, it is always safe to use the RUN and QUIT statements in tandem.
When to include optional statements?
The previous sections describe unnecessary statements that I like to skip. However, sometimes I include optional statements in my programs for clarity, readability, or to practice defensive programming.
SAS supports many optional statements. When you omit an optional statement, the procedure does some default behavior. For example, if you omit the VAR statement, most procedures runs on all numerical variables (for example, PROC MEANS) or on all variables (for example, PROC PRINT). When I want the default behavior, I skip the VAR statement.
Another statement that is technically optional is the RUN statement for a sequence of procedures. Because the next call to a procedure or DATA step will always end the previous procedure, you can technically omit the RUN statement for all but the last procedure. This means that the following program is valid, although I do not recommend this style of programming:
data class; set sashelp.class(where=(sex='M')); /* 'class' is the _LAST_ data set */ proc means; /* DATA= _LAST_ */ proc print; /* DATA= _LAST_ */ run; |
If I'm feeling lazy, I might write these statement during the early exploratory phase of a data analysis. However for serious work I terminate every procedure by using a RUN or QUIT statement. Skipping a RUN statement can lead to undesirable interactions with global statements such as the TITLE statement and ODS statements.
Your thoughts?
There is much more that can be said about these topics. What are your thoughts?
- Are there unnecessary statements that you write out of habit?
- Are there optional statements that you always include because it makes the program clearer?
23 Comments
Rick, in the code below, the 'from example' was colored green, indicating that it was commented out by the '*' after 'select'. However, when I ran it in SAS, it was not commented out.
proc sql;
create table Example (x num, y num); /* statement 1 */
insert into Example
values(1, 2) values(3, 4) values(5, 6); /* statement 2 */
select *
from Example; /* statement 3 */
/* no RUN statement required! */
Yes. In general, the color-coding editor should be used as a guide, not as the definitive truth. The SAS grammar is smarter than the color-coder in the program editor, which only knows about general SAS syntax, not about the specific grammar of every procedure. Furthermore, you will sometimes see statements that are colored red because the editor does not recognize them, even though they are valid statements.
My guess would be that this isn't related to the syntax highlighter unless you wrote this in SAS Studio? I would wonder if you edited this in the article and ended up with the wrong style, I do that a lot by accident when I'm writing powerpoints or papers. I've never seen the highlighter get this particular issue wrong :)
Ah, based on Joe's response, I realize now you are talking about the way that WordPress blogs color SAS syntax. Yes, whenever it sees '*' it colors green until the next semicolon. I don't know who wrote the WordPress plug-in, but it isn't an "official" SAS product. I appreciate that it gets a lot of the syntax correct!
One minor note: It's been a very long time since I was on a mainframe but in a mainframe batch job, RUN statements are not needed. You may have to be careful with titles and macros but the step boundaries work just fine and the last step runs when it encounters the JCL end of deck card. RUN was added to SAS only after running jobs in TSO was added to the tool box.
Personally, I like to always have a RUN statement since it gives a strong visual indicator of the end of the step.
And, I have seen some people habitually write ++ instead of +. I forget their reasoning but they had a defense of the practice.
Actually there is a case on the mainframe (as well as all other OSs) where a RUN is needed. If you are creating a macro variable using CALL SYMPUT and then immediately using that macro variable. For instance:
data _null_;
set sasuser.heart;
call symput ('mvar', weight);
%put weight is &mvar;
without having a step boundary, such as a RUN statement to complete the DATA step, the macro variable doesn't get created in time for the macro variable reference to use it.
Because the macro variable is created by the DATA step, the macro is not created until the DATA step runs.
I'm "guilty" of putting RUN; after data steps with CARDS/DATALINES, even though I know it's not necessary. I really like seeing that RUN; at the end of a step boundary.
I think the semicolon after a macro call is a really BAD thing. As you says, usually it just generates a null statement. But if you have a macro that is designed to generate only a part of a SAS statement (e.g. a function-style macro), and you end the macro call with a semicolon, that extra semicolon will often break things. So agree this is a habit to avoid (although it's a very common habit, unfortunately).
Agreed that whether or not a semicolon goes after a macro should be a conscious choice and not merely a blind habit. I think it's good to have the semicolon there when you know the macro expands to a complete set of statements since it clarifies the syntactic role of the macro for other programmers (and syntax highlighters that aren't able to do the expansion themselves).
Is a GLOBAL statement required for every global macro variable defined? I include it since in other program languages it is a declaration statement. But it doesn't seem to be required in SAS. Your thoughts?
Hi Mike D,
While it isn't required to declare global macro variables, it is good practice to be definitive as to what symbol table your macro variable is being created in. Especially when you are defining macro definitions and have local symbol tables too. You might find this flowchart useful - http://support.sas.com/documentation/cdl/en/mcrolref/67912/HTML/default/viewer.htm#p0g7wk5o8cji7ln16v9bmzu5xjto.htm
Cheers,
Michelle
Breaking habits can be hard... Good post Rick. Perhaps this might encourage people to do more code reviews and/or have more SAS coding best practice discussions.
Sometimes I see people using run statements after global statements. For example:
title "My class data report";
run;
proc print data=sashelp.class;
run;
Similar to ending macro calls with semicolons, a lot of users will always include periods at the ends of macro variables, but the period is only necessary when the macro variable is concatenated with other text/macro variables to the right of it. For example:
where Make = "&selectMake.";
instead of
where Make = "&selectMake";
The latter works fine with no period necessary.
Except ... I run code across multiple platforms. What might not be required in one may be required in another. So, rather than trying to remember if the code needs a semi-colon, or one or two periods after a macro variable concatenation, I just try to code for the most universal. A few unnecessary characters, but it saves a lot of time and frustration when something that ran perfectly fine on one platform blows up on another.
Habits are hard to break. In this case, developing good habits ... do you really want to break them for the sake of a few keystrokes?
Can you provide an example of a syntax that is platform-specific?
Rick, I understand the point you are making in your post, but respectfully disagree with some of your points. For quick, ad-hoc purposes; many of the code short-cuts you outline above are perfectly legitimate. However, for large, complex production routines, these types of code practices can be extremely problematic. As one of the masses of SAS developers who have had the unfortunate task of having to unravel SAS routines developed by others for maintenance, enhancements or more often the case, to correct a fundamental flaw in the routine, these 'shortcut' coding practices can and often do case problems. Many of the problems I've had to solve in routines were related to these 'shortcut' coding practices because the developer(s) did not fully understanding what/how this might affect the rest of their logic. Instead, they make incorrect assumptions that a ";" here or there makes everything OK. Another issue and again, generally with larger, more complex routines is simple readability. I truly applaud some of the SAS gurus who can condense extremely complex data logic into a single statement...but often times, it falls on someone else to have to maintain it later.
Again, this is just my opinion, but I believe there's something to be said for solid coding practices that adhere to basic standards (e.g. use a RUN; statement after each DATA Step; include COMMENTS, etc.) and develop for maintainability and readability.
Thanks for writing. I welcome diverse opinions. I agree with what you say: what one programmer sees as "unnecessary" might be a necessary defensive technique for someone else. I know a good programmer who puts
RUN; QUIT;
after every SAS procedure and DATA step. Another programmer constantly puts (unnecessary) parentheses around the logical statements in IF-THEN/ELSE clauses. I understand why they make these choices.
If you search the internet for
SAS defensive programming techniques
you will find many tips and techniques for defensive programming in complex production routines. Some of them are rather clever.
Rick,
Another great blog post!
I admit to being guilty of the first two things you listed above. In addition, I cannot seem to break the habit of coding--and then quickly deleting--the way-long-defunct MACROGEN option in an OPTIONS statement when debugging a macro program:
options macrogen symbolgen mprint mlogic;
In addition to that, I say "you know" a lot when I am talking to people. So, bad habits don't strictly lend themselves to SAS programming:-)
Best of luck in all your SAS endeavors!
----MMMMIIIIKKKKEEEE
Hi Rick,
Thanks for this piece. There are some unnecessary statements that I use, most of them out of habit, but i don't really care much about them since they don't generate error or result into wrong result.
Hi Rick
Clearly a topic drawing out issues of personal preference and team standard. Sometimes common practise at a site is built on limited "on the job training" and "examples as guidance", rather than professional standards.
An example of this echoes your preference - eliminate redundant code unless it supports clarity. I dislike a standard suggesting a dot should trail every macrò variable
I see several people put RUN statements after a LIBNAME statement. Bugs me to no end!
I am in the habit of putting semi-colons after my macro calls, just to make the color coding in the editor window happy and help prevent me from making syntax errors. Then I delete those semicolons. since it is poor programming practice.
Back in the late 80's/early 90's our job performance was based on the number of lines of code we wrote. Where number of lines = number of semicolons. You can bet we put a lot of superfluous RUN statements in there!
Like you, I try to eliminate unnecessary semicolons and RUN statements. I always cringe a bit when I see unnecessary semicolons after macro calls. So you might be surprised that I begin many of my macros with a semicolon. When I write a macro that consist of full steps (as opposed to a macro that generates a fragment of a statement or a partial step), I begin it with a semicolon:
%macro test( );; --- macro code --- %mend;
Note the double semicolon. If a user of my macro leaves out a semicolon before calling my macro, he or she might get a puzzling error message from code deep inside my macro if I leave out the leading semicolon. With the leading semicolon, a more interpretable SAS error message typically comes out in this error situation.
Writing code for others demands these higher standards, but writing a quick fix needs no documentation - until it becomes a popular "work-around".
So I think we need coding habits that are economical, not only for the author, but also for those who "pick up the pieces" - when methods are used outside their effective domain, and fail. Not only is brevity the resulting gain from deprecating unneccessary SAS statements, those who follow will appreciate the clarity of purpose when the see "the code and nothing but the code".
Abhor redundancy in programming
A statement is not redundant if it clarifies.
Production code should not be expected to serve as a model, or example for learning.
But should the code generated by wizards serve in that way?
Should there be a warning in the code generated by a wizard (for example, the tasks of EG and DI and proc import) - redundant code might be generated here?