Jedi SAS Tricks: The DSTODS2 Procedure

8
Mastering the SAS DS2 Procedure second edition - the DS to DS2 Procedure

What new things will I learn in the second edition of
Mastering the SAS DS2 Procedure?

You have the infrastructure, and you know that DATA step would run so much better in DS2, but it’s complicated and you don’t know how to get started. Well, if you have SAS 9.4M5, rejoice! The DStoDS2 procedure is here, and it’s a humdinger! (No, Chris, not a Hemedinger… ;-) It's one of the many new things you can learn about in the new second edition of "Mastering the SAS DS2 Procedure - Advanced Data Wrangling Techniques"

Here are a few important caveats we should be aware of before diving in:

  1. PROC DStoDS2 can’t translate all possible DATA step syntax, but it does support a huge subset. Any lines of code lines in the traditional DATA step that cannot be translated will appear as comments in the output. Any comments in the original DATA step program are removed.
  2. PROC DStoDS2 translates only one DATA step at a time. The DATA step to be translated must be saved to its own text file which will serve as input.
  3. In the traditional DATA step, character variable lengths are measured in bytes, but in DS2 fixed-length character variable lengths are specified in characters. If your data has multi-byte encoding, adjust any LENGTH statements in the original DATA step before running PROC DSTODS2 to specify the variable length in characters, so that the DS2 DATA program will specify the correct lengths upon conversion.
  4. The resulting DS2 program might not be syntactically complete. You will have to add a RUN statement and any commented sections will have to be converted by hand. Perhaps a few other tweaks will be necessary before it executes and produce the desired results.

Having said all of that, PROC DStoDS2 makes converting traditional DATA step code to DS2 a snap! Let’s try it out on a sample program that as a lot of typical DATA step features:

data audi   (drop=make count)
     bmw    (drop=make count)
     counts (keep=make count);
   if _n_=1 then do;
      put 'And so it begins!';
   end;
   set cars;
   by Make;
   if first.make then Count=0;
   Count+1;
   select (Make);
      when ("Audi") output audi;
      when ("BMW")  output bmw;
      otherwise put Make= Model= ' - how did YOU get in here?';
   end;
   if last.make then output counts;
run;

We’ll save that to a file named DStoDS2_data_step.sas. Next, we’ll write and execute a PROC DStoDS2 step to convert the DATA step program to DS2 and save the new program as DStoDS2_data_step_converted.sas:

proc dstods2 in="&path/DStoDS2_data_step.sas" 
             out="&path/DStoDS2_data_step_converted.sas";
run;

A quick look at the output shows it’s not too pretty, but it seems functional enough. Because I’m working in SAS Studio, a quick click on the Format Code button makes it more legible. For the purposes of the blog, I’ll format it a little by hand:

data AUDI(DROP=(MAKE COUNT) ) 
     BMW(DROP=(MAKE COUNT) ) 
     COUNTS(KEEP=(MAKE COUNT));
   method run();
   if _N_=1.0 then
      do;
         put 'And so it begins!';
      end;
   set CARS;
   by MAKE;
 
   if FIRST.MAKE then
      COUNT=0.0;
   COUNT + 1.0;
 
   select (MAKE);
      when ('Audi') output AUDI;
      when ('BMW') output BMW;
      otherwise put MAKE=MODEL=' - how did YOU get in here?';
   end;
 
   if LAST.MAKE then
      output COUNTS;
   ;
_return:
   ;
end;
enddata;

That’s not too bad, actually. I have to add the PROC DS2 statement and the RUN and QUIT, obviously. I’ll remove that unused _return: label. And for efficiency’s sake, I’ll convert the IF _N_ =1 DO group into an INIT method:

proc ds2;
data AUDI_DS2(DROP=(MAKE COUNT)) 
     BMW_DS2(DROP=(MAKE COUNT)) 
     COUNTS_DS2(KEEP=(MAKE COUNT));
   method init();
         put 'And so it begins!';
   end;
   method run();
      set CARS;
      by MAKE;
      if FIRST.MAKE then COUNT=0.0;
      COUNT + 1.0;
      select (MAKE);
         when ('Audi') output AUDI_DS2;
         when ('BMW') output BMW_DS2;
         otherwise put MAKE=MODEL=' - how did YOU get in here?';
      end;
      if LAST.MAKE then output COUNTS_DS2;
   end;
enddata;
run;
quit;

Now, that seemed almost too easy, didn't it? Let's run my shiny new DS2 program, and check the log to make sure everything is running smoothly. Oops! It looks like my DS2 program produced a warning in the log:

WARNING: No DECLARE for assigned-to variable count; creating it as a global variable 
         of type double.

So back I’ll go to the editor, and add a declaration for the COUNT variable. This time it runs without a warning. Now, I’m a suspicious guy, so I’ll sort the outputs from both processes into the same order and run PROC COMPARE to ensure it’s all working as expected. Sure enough, I get that lovely “No unequal values were found. All values compared are exactly equal.” report.

So that’s the scoop – PROC DStoDS2 can ingest some pretty complex DATA step code and do 99% of the work needed to convert it to DS2. Remember Jedi Programming Rule #1? If not, here you go:
“Lazy programmers are great programmers.”

So lazy programmers of the world, rejoice - the DStoDS2 procedure has arrived! And, if you’re too lazy to copy the code in this post by hand, you can download the ZIP file from this link.

Until next time, may the SAS be with you!
Mark

Tags
Share

About Author

SAS Jedi

Principal Technical Training Consultant

Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. He served 20 years as a US Navy submariner, pursuing his passion for programming as a hobby. Upon retiring from the Navy in 1994, he turned his hobby into a dream job and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS Foundation programming classes, and is proud to announce his first book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques". When he isn’t writing, teaching, or posting “Jedi SAS Tricks” here on the SAS Learning Post, Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in Toano, VA with his wife, Lori, and their cat, the amazing Tiger Man. To connect with Mark, check out his SAS Press Author page, follow him on Twitter @SASJedi or connect on Facebook or LinkedIn.

Related Posts

8 Comments

    • SAS Jedi

      Tricky question! Yes, DS2 supports character variables longer than 32K. I have personally used VARCHAR variables in a DS2 program to hold JSON responses over 1 million characters long. But they were only used for processing. There's no place I know of where you can STORE a varchar that big. For example, in LATIN encoding, the largest character variable that can be stored in Teradata is 64,000 characters. And for SAS Data sets, the limit for character variables is still 32,767 bytes. SO yes, you can process them - but you probably can't store them anywhere...

  1. I've been having trouble getting this procedure to work in SAS University Edition. I tried copying your example and I always get this result:

    1 proc dstods2 in='/folders/myfolders/DStoDS2_data_step.sas'
    2 out='/folders/myfolders/DStoDS2_data_step_converted.sas';
    3 run;

    ERROR: Could not generate formats text file.
    ERROR: Could not compile source code.
    NOTE: The SAS System stopped processing this step because of errors.

    The syntax of the procedure is simple enough that it's hard for me to see how I might have gone wrong, maybe it's a problem with the configuration of University Edition somehow?

Leave A Reply

Back to Top