Finding your Dream Car with a little macro magic

25

Suppose you want a list of car manufacturers from the CARS dataset. Easy!  Call the %CHARLIST macro from a %PUT statement, like this:

The CHARLIST macro generates a list of unique values of a selected variable from a selected dataset. So does PROC FREQ. But, if you don't need statistics, the CHARLIST macro is quick and easy to call, and can be called from anywhere.

Suppose you want a list of all available vehicle types to appear in a footnote. Easy! Call the %CHARLIST macro from a FOOTNOTE statement, like this:

proc print data=sashelp.cars(obs=10) noobs;
   var make model type msrp;
   title "First 10 rows from CARS dataset";
   footnote "Vehicle types include: %charlist(sashelp.cars,type)";
run;

 

You can't call PROC FREQ (or any proc) from a FOOTNOTE statement!

The CHARLIST macro:

%macro charlist(dsn,var) / minoperator mindelimiter=',';
	%local dsid rc list i;
	%let dsid=%sysfunc(open(&dsn(keep=&var)));
    	%syscall set(dsid);
	%let rc=%sysfunc(fetch(&dsid));
	%let list=&&&var;
    	%do i=2 %to %sysfunc(attrn(&dsid,nlobs));
	   %let rc=%sysfunc(fetch(&dsid));
	   %if not (%bquote(&&&var) in &list) %then %let list=&list, %bquote(%trim(&&&var));
	%end;
    	%let dsid=%sysfunc(close(&dsid));
	&list
%mend charlist;
  • The OPEN function opens SASHELP.CARS, keeping only the variable MAKE.
  • The SET routine creates a macro variable MAKE from the dataset variable MAKE.
  • The NLOBS argument of the ATTRN function returns the number of observations in a dataset, which serves as the stop value of the %DO loop.
  • The FETCH function reads observations from SASHELP.CARS, populating the macro variable MAKE.
  • If the current MAKE is not already in the macro variable LIST, it is appended to LIST.

Be sure to save the CHARLIST macro in your stored compiled macro or autocall library for convenient re-use!

The CARFINDER macro calls the CHARLIST macro. The CARFINDER macro accepts a CAR parameter (vehicle make) which is referenced in the WHERE and TITLE statements. More importantly, the CAR parameter is validated against a list of all car makes in the SASHELP.CARS dataset, to insure a valid car make is selected. If so, the CARFINDER macro issues a PROC PRINT. If a valid car make is not selected, the CARFINDER macro issues an error message, followed by a list of valid car makes to choose from. This is called data-driven parameter validation.

%macro carfinder(car) / minoperator mindelimiter=',';
 
   %if &car in %charlist(sashelp.cars,make) %then %do;
      proc print data=sashelp.cars noobs;
         var model type msrp;
         where make="&car";
         title "&car";
      run;
   %end;
 
   %else %do;
      %put ERROR: No &car.s.;
      %put NOTE: Cars include %charlist(sashelp.cars,make).;
   %end;
 
%mend carfinder;

 

Did you spot the inefficiency? The CARFINDER macro is inefficient because it potentially calls the CHARLIST macro twice, which then reads the CARS dataset twice. Avoiding this inefficiency is easy. Call the CARFINDER macro only once, from a %LET statement, to create the macro variable CARMAKES, which stores the list of valid car makes. Then reference the macro variable CARMAKES as needed.

%macro carfinder(car) / minoperator mindelimiter=',';
 
   %local carmakes;
   %let carmakes=%charlist(sashelp.cars,make);
 
   %if &car in &carmakes %then %do;
      proc print data=sashelp.cars noobs;
         var model type msrp;
         where make="&car";
         title "&car";
      run;
   %end;
 
   %else %do;
      %put ERROR: No &car.s.;
      %put NOTE: Cars include &carmakes.;
   %end;
 
%mend carfinder;

Happy car finding!

Share

About Author

Jim Simon

Principal Technical Training Consultant

Jim Simon is a principal instructor and course developer for SAS Education. Jim has a bachelor's degree from UCLA and a master's degree from California State University at Northridge. Prior to joining the SAS Irvine office in 1988, Jim was an instructor at Ventura College and a SAS programmer at The Medstat Group in Santa Barbara. Jim's areas of specialization include the DATA step, application development, web enablement, and the SAS macro language. A native of Southern California, Jim enjoys anything in the warm California sun. On weekends, Jim loves jumping in his Corvette, turning up the stereo, and cruising Pacific Coast Highway, top down, South to Laguna Beach or North to his old home town, Santa Barbara.

25 Comments

  1. The first variable I tried this on (FETAL_OUTCOME) failed because one value contained an ampersand (the abbreviation for labor and delivery):
    LIVEBORN
    STILLBORN - ALIVE ON L&D DELIVERY
    STILLBORN - DEMISE PRIOR TO ADMISSION

    Could anyone share the fix for this situation?
    Thanks,
    Sara

  2. Peter Lancashire on

    I prefer the SQL version for the reasons stated. It will be faster for indexed columns, as I assume the SQL engine just reads the index, not the table.
    .
    For me readability and therefore maintainability wins every time. The option to embed is a minor advantage. Now if SAS could be made to work like a proper programming language without a macro processor... Only dreaming!

  3. Claire Gallagher on

    Wow, really helpful. I applied the car finder macro to apply to any variable within any SAS dataset:
    %macro varfinder (dataset=,checkvar=,listvar=) / minoperator mindelimiter=',';

    %local ourlist;
    %let ourlist=%charlist(&dataset,&listvar);

    %if &checkvar in &ourlist %then %do;
    proc print data=&dataset noobs;
    where &listvar="&checkvar";
    title "&checkvar listings";
    run;
    %end;
    %else %do;
    %put ERROR: &checkvar not in list;
    %put NOTE: ourlist include &ourlist;
    %end;

    %mend varfinder;

  4. Wow, it's pretty special how you can call this macro from anywhere.

    FYI it can break if the data includes commas, such as:

    %put %charlist(sashelp.baseball, name);

    To answer the conversation yesterday, SQL is faster for large data. This took about a minute:

    %put %charlist(maps.world, segment);

    This took a fraction of a second:

    proc sql noprint;
    select distinct segment into :seglist separated by ','
    from maps.world;
    quit;
    %put &seglist.;

    • Jim Simon

      Thank you, Brian, for comparing efficiency, which turns out to be a major consideration. If you have time, you might try FETCHOBS instead of FETCH, and let us know how that compares.

      The NAME column in sashelp.baseball presents a special challenge due to embedded commas and apostrophes in names.
      I handled that by changing the delimiter from a comma to an asterisk and using %SUPERQ, like this:

      %macro charlist(dsn,var) / minoperator mindelimiter='*';
      %local dsid rc list i;
      %let dsid=%sysfunc(open(&dsn(keep=&var)));
      %syscall set(dsid);
      %let rc=%sysfunc(fetch(&dsid));
      %let list=%qtrim(%superq(&var));
      %do i=2 %to %sysfunc(attrn(&dsid,nlobs));
      %let rc=%sysfunc(fetch(&dsid));
      %if %superq(&var) ne %str( ) %then %if not (%superq(&var) in &list) %then %let list=&list*%qtrim(%superq(&var));
      %end;
      %let dsid=%sysfunc(close(&dsid));
      &list
      %mend charlist;

  5. I would argue that Leonid's version is more readable and maintainable, and of course could be made into a macro. I do wonder if the %charlist macro as presented sees better performance than Leonid's PROC SQL example. But unless there is a noticeable performance gap, I think the more readable proc sql version is better for clean code that is easy to understand.

    Either way, it's a neat macro function that I could see coming in handy. Nice work!

    • Jim Simon

      Thank you, Paul. I agree that code should be readable and maintainable. Leonid's code is concise, straightforward, and intuitive. And, for convenience, it can easily be made into a macro, as you noted. It would be interesting if somebody compared the two against a large dataset to note any efficiency difference.

      My "pure" macro code offers one advantage: It can be called "in-line" from anywhere, for example, from any macro language statement or expression (%PUT, %LET, %IF). It can also be called from SAS code (TITLE or FOOTNOTE statements), while a macro that generates SAS code does not offer that flexibility. The macro variable created by PROC SQL can, of course, be referenced anywhere. But there is the overhead of having to call the macro first to create the macro variable, and the overhead of storing the macro variable in the symbol table. Regardless, both are fine choices with their own advantages.

  6. Leonid Batkhan

    Hi Jim, nice macro implementation!

    Here is a non-macro version of it:

    proc sql noprint;
    select distinct make into : list separated by ', '
    from sashelp.cars;
    quit;

    %put &list;

    • Jim Simon

      Thank you, Jan. Did you try pointing to a database table with a LIBNAME statement before calling the macro?

  7. Hi Jim,

    Very cool! I can get this to work with a simple test dataset. But when I try it with a permanent dataset that we have, I get this error:

    SYMBOLGEN: Macro variable DSN resolves to sasnet.profile_plus_h
    SYMBOLGEN: Macro variable VAR resolves to perinatal_c
    671 %put %charlist(sasnet.profile_plus_h, perinatal_c);
    SYMBOLGEN: Macro variable DSID resolves to 15
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Macro variable VAR resolves to perinatal_c
    SYMBOLGEN: Macro variable PERINATAL_C resolves to
    SYMBOLGEN: Macro variable DSID resolves to 15
    SYMBOLGEN: Macro variable DSID resolves to 15
    SYMBOLGEN: && resolves to &.
    SYMBOLGEN: Macro variable VAR resolves to perinatal_c
    SYMBOLGEN: Macro variable PERINATAL_C resolves to Level II
    SYMBOLGEN: Macro variable LIST resolves to
    ERROR: Operand missing for IN operator in argument to %EVAL function.
    ERROR: The macro CHARLIST will stop executing.

    The variable PERINATAL_C is a character variable with a length of 9. The values of PERINATAL_C have imbedded blanks. Could that be the problem?

    • Jim Simon

      Hey David. Thanks for posting. Imbedded blanks should not be a problem. My guess right now ... PERINATAL_C has a missing value on the first row. I'll check further and let you know what I find!

    • Jim Simon

      Yup. A missing value on any row is a problem. To fix, modify the %IF like this:

      %if &&&var ne %str( ) %then %if not (%bquote(&&&var) in &list) %then %let list=&list, %bquote(%trim(&&&var));

      Thanks, David, for reporting!

    • Jim Simon

      Need %BQUOTE!

      %if %bquote(&&&var) ne %str( ) %then %if not (%bquote(&&&var) in &list) %then %let list=&list, %bquote(%trim(&&&var));

    • Jim Simon

      Adraine, thank you for asking. I should have added this to my bullet list of explanations.

      A single ampersand is a direct reference to a macro variable.
      Multiple ampersands represent an indirect reference.
      Three ampersands cause the reference to be scanned twice, like this:

      original reference: &&&var
      after 1st scan: &make (&& resolves to &, &var resolves to make)
      after 2nd scan: Acura

Back to Top