SAS author’s tip: Macro language timing is everything

0

This SAS tip is from Robert Virgile and his book “SAS Macro Language Magic: Discovering Advanced Techniques”.

We hope you find this tip useful. You can also read an excerpt from Virgile’s book.

In macro language, as in life, timing is everything.  Macro language students need to learn the timing of the DATA step, the timing of macro language, and the relationship between the two.

Let’s begin with the DATA step.  All DATA steps operate in two separate phases:

  1. The compilation phase. In a nutshell, the software checks the syntax of the DATA step statements, and sets up storage space in memory to hold each variable.
  2. The execution phase. Given that there are no syntax errors, the software executes the DATA step … reading data, performing calculations, outputting results.

Macro language statements may have an impact on step 1, the compilation phase.  The resolution of macro variables affects the statements within the DATA step:

%let dataset=MALES;
data &dataset;
   set everyone;
   if gender='M';
run;

During the compilation phase of the DATA step, &DATASET resolves into MALES.  Therefore, the name of the output data set becomes MALES.  However, macro language statements impact only the compilation phase, not the execution phase of the DATA step.  This concept forms a frequent stumbling block when learning macro language.  To illustrate, consider this DATA step (before the programmer complicated it by adding macro language):

data MALES FEMALES;
   set everyone;
   if gender='M' then output MALES;
   else if gender='F' then output FEMALES;
run;

Perhaps the programmer was trying to learn macro language, and using this as an experiment.  Perhaps the programmer sought job security.  But the simple DATA step above morphed into this nonworking version:

data MALES FEMALES;
   set everyone;
   if gender='M' then do;
      %let dataset=MALES;
   end;
   else if gender='F' then do;
      %let dataset=FEMALES;
   end;
   output &dataset;
run;

Mistakenly, the programmer believed that %LET statements could execute as part of the DATA step.  That is just never true.  %LET statements execute immediately … in this case before the compilation phase of the DATA step completes.  So the order of execution of these statements is:

%let dataset=MALES;
%let dataset=FEMALES;
data MALES FEMALES;
   set everyone;
   if gender='M' then do;
   end;
   else if gender='F' then do;
   end;
   output FEMALES;
run;

Clearly, the program revisions alter the outcome, forcing every observation into a single data set.  Remember these basics:

  • %LET statements are never part of a DATA step. Macro language statements execute immediately, and do not wait for the DATA step to begin executing.
  • If you need to control macro variables (either assigning or retrieving a value) while the DATA step executes, tools exist. But they are DATA step tools, not macro language tools.  The primary ones, CALL SYMPUT and SYMGET, will become the subject of a future article.

Let’s consider another example that both illustrates timing and illustrates a basic use of CALL SYMPUT.  Once again, improper use of macro language complicates the program.  Here is the original version, without macro language:

data percentages;
   state_pop=0;
   do until (last.state);
      set cities;
      by state;
      state_pop + city_pop;
   end;
   do until (last.state);
      set cities;
      by state;
      percent_pop = city_pop / state_pop;
      output;
   end;
run;

For each STATE:

  • The top DO loop computes STATE_POP (the total population for the STATE).
  • The bottom DO loop reads the same observations, computes PERCENT_POP for each, and outputs the result.

Now a macro language student might attempt a slightly different, nonworking variation:

data percentages;
   state_pop=0;
   do until (last.state);
      set cities;
      by state;
      state_pop + city_pop;
   end;
   call symputx ('denom', state_pop);
   do until (last.state);
      set cities;
      by state;
      percent_pop = city_pop / &denom;
      output;
   end;
run;

Bad timing is the critical issue:

  • Before the DATA step runs, &DENOM does not exist.
  • The software doesn’t begin to run the DATA step until it encounters the RUN statement.
  • By that time, the reference to &DENOM has already been encountered, generating an error.

There are many ways to introduce timing errors.  The remedy begins with understanding the relationship between macro language statements, DATA step compilation, and DATA step execution.  Most importantly, macro language statements execute immediately, and are never part of DATA step execution.

For more information about the macro language and the magic you can create with it, check out Robert Virgile’s book “SAS Macro Language Magic: Discovering Advanced Techniques”.

Share

About Author

Cindy Puryear

Senior Marketing Specialist, SAS Publications

Comments are closed.

Back to Top