Jedi SAS Tricks: The DSTODS2 Procedure

17
Mastering the SAS DS2 Procedure second edition - the DS to DS2 Procedure

What new things will I learn in the second edition of
Mastering the SAS DS2 Procedure?

You have the infrastructure, and you know that DATA step would run so much better in DS2, but it’s complicated and you don’t know how to get started. Well, if you have SAS 9.4M5, rejoice! The DStoDS2 procedure is here, and it’s a humdinger! (No, Chris, not a Hemedinger… ;-) It's one of the many new things you can learn about in the new second edition of "Mastering the SAS DS2 Procedure - Advanced Data Wrangling Techniques"

Here are a few important caveats we should be aware of before diving in:

  1. PROC DStoDS2 can’t translate all possible DATA step syntax, but it does support a huge subset. Any lines of code lines in the traditional DATA step that cannot be translated will appear as comments in the output. Any comments in the original DATA step program are removed.
  2. PROC DStoDS2 translates only one DATA step at a time. The DATA step to be translated must be saved to its own text file which will serve as input.
  3. In the traditional DATA step, character variable lengths are measured in bytes, but in DS2 fixed-length character variable lengths are specified in characters. If your data has multi-byte encoding, adjust any LENGTH statements in the original DATA step before running PROC DSTODS2 to specify the variable length in characters, so that the DS2 DATA program will specify the correct lengths upon conversion.
  4. The resulting DS2 program might not be syntactically complete. You will have to add a RUN statement and any commented sections will have to be converted by hand. Perhaps a few other tweaks will be necessary before it executes and produce the desired results.

Having said all of that, PROC DStoDS2 makes converting traditional DATA step code to DS2 a snap! Let’s try it out on a sample program that as a lot of typical DATA step features:

data audi   (drop=make count)
     bmw    (drop=make count)
     counts (keep=make count);
   if _n_=1 then do;
      put 'And so it begins!';
   end;
   set cars;
   by Make;
   if first.make then Count=0;
   Count+1;
   select (Make);
      when ("Audi") output audi;
      when ("BMW")  output bmw;
      otherwise put Make= Model= ' - how did YOU get in here?';
   end;
   if last.make then output counts;
run;

We’ll save that to a file named DStoDS2_data_step.sas. Next, we’ll write and execute a PROC DStoDS2 step to convert the DATA step program to DS2 and save the new program as DStoDS2_data_step_converted.sas:

proc dstods2 in="&path/DStoDS2_data_step.sas" 
             out="&path/DStoDS2_data_step_converted.sas";
run;

A quick look at the output shows it’s not too pretty, but it seems functional enough. Because I’m working in SAS Studio, a quick click on the Format Code button makes it more legible. For the purposes of the blog, I’ll format it a little by hand:

data AUDI(DROP=(MAKE COUNT) ) 
     BMW(DROP=(MAKE COUNT) ) 
     COUNTS(KEEP=(MAKE COUNT));
   method run();
   if _N_=1.0 then
      do;
         put 'And so it begins!';
      end;
   set CARS;
   by MAKE;
 
   if FIRST.MAKE then
      COUNT=0.0;
   COUNT + 1.0;
 
   select (MAKE);
      when ('Audi') output AUDI;
      when ('BMW') output BMW;
      otherwise put MAKE=MODEL=' - how did YOU get in here?';
   end;
 
   if LAST.MAKE then
      output COUNTS;
   ;
_return:
   ;
end;
enddata;

That’s not too bad, actually. I have to add the PROC DS2 statement and the RUN and QUIT, obviously. I’ll remove that unused _return: label. And for efficiency’s sake, I’ll convert the IF _N_ =1 DO group into an INIT method:

proc ds2;
data AUDI_DS2(DROP=(MAKE COUNT)) 
     BMW_DS2(DROP=(MAKE COUNT)) 
     COUNTS_DS2(KEEP=(MAKE COUNT));
   method init();
         put 'And so it begins!';
   end;
   method run();
      set CARS;
      by MAKE;
      if FIRST.MAKE then COUNT=0.0;
      COUNT + 1.0;
      select (MAKE);
         when ('Audi') output AUDI_DS2;
         when ('BMW') output BMW_DS2;
         otherwise put MAKE=MODEL=' - how did YOU get in here?';
      end;
      if LAST.MAKE then output COUNTS_DS2;
   end;
enddata;
run;
quit;

Now, that seemed almost too easy, didn't it? Let's run my shiny new DS2 program, and check the log to make sure everything is running smoothly. Oops! It looks like my DS2 program produced a warning in the log:

WARNING: No DECLARE for assigned-to variable count; creating it as a global variable 
         of type double.

So back I’ll go to the editor, and add a declaration for the COUNT variable. This time it runs without a warning. Now, I’m a suspicious guy, so I’ll sort the outputs from both processes into the same order and run PROC COMPARE to ensure it’s all working as expected. Sure enough, I get that lovely “No unequal values were found. All values compared are exactly equal.” report.

So that’s the scoop – PROC DStoDS2 can ingest some pretty complex DATA step code and do 99% of the work needed to convert it to DS2. Remember Jedi Programming Rule #1? If not, here you go:
“Lazy programmers are great programmers.”

So lazy programmers of the world, rejoice - the DStoDS2 procedure has arrived! And, if you’re too lazy to copy the code in this post by hand, you can download the ZIP file from this link.

Until next time, may the SAS be with you!
Mark

Tags
Share

About Author

SAS Jedi

Principal Technical Training Consultant

Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. He served 20 years as a US Navy submariner, pursuing his passion for programming as a hobby. Upon retiring from the Navy in 1994, he turned his hobby into a dream job and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS Foundation programming classes, and is proud to announce his first book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques". When he isn’t writing, teaching, or posting “Jedi SAS Tricks” here on the SAS Learning Post, Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in Toano, VA with his wife, Lori, and their cat, the amazing Tiger Man. To connect with Mark, check out his SAS Press Author page, follow him on Twitter @SASJedi or connect on Facebook or LinkedIn.

Related Posts

17 Comments

    • SAS Jedi

      Tricky question! Yes, DS2 supports character variables longer than 32K. I have personally used VARCHAR variables in a DS2 program to hold JSON responses over 1 million characters long. But they were only used for processing. There's no place I know of where you can STORE a varchar that big. For example, in LATIN encoding, the largest character variable that can be stored in Teradata is 64,000 characters. And for SAS Data sets, the limit for character variables is still 32,767 bytes. SO yes, you can process them - but you probably can't store them anywhere...

  1. I've been having trouble getting this procedure to work in SAS University Edition. I tried copying your example and I always get this result:

    1 proc dstods2 in='/folders/myfolders/DStoDS2_data_step.sas'
    2 out='/folders/myfolders/DStoDS2_data_step_converted.sas';
    3 run;

    ERROR: Could not generate formats text file.
    ERROR: Could not compile source code.
    NOTE: The SAS System stopped processing this step because of errors.

    The syntax of the procedure is simple enough that it's hard for me to see how I might have gone wrong, maybe it's a problem with the configuration of University Edition somehow?

  2. I tried the following code, but got an error. The SASHELP.CARS dataset works fine in a normal dataset, so I'm not sure what the problem is. Any ideas?

    proc ds2;
    data TEST;
    method run();
    set SASHELP.CARS;
    end;
    enddata;
    run;
    quit;

    177 proc ds2;
    ERROR: Client unable to establish connection.
    ERROR: The DATAPATH= option specifies physical paths, listed above, that do not exist or are
    inaccessible
    ERROR: PROC DS2 initialization failed.
    178 data TEST;
    179 method run();
    180 set SASHELP.CARS;
    181 end;
    182 enddata;
    183 run;
    184 quit;

    • I also tried the code from your example and got an error.

      17 proc ds2;
      18 data AUDI_DS2(DROP=(MAKE COUNT))
      19 BMW_DS2(DROP=(MAKE COUNT))
      20 COUNTS_DS2(KEEP=(MAKE COUNT))/overwrite=yes;
      21 dcl double Count;
      22 method init();
      23 put 'And so it begins!';
      24 end;
      25 method run();
      26 set CARS;
      27 by MAKE;
      28 if FIRST.MAKE then COUNT=0.0;
      29 COUNT + 1.0;
      30 select (MAKE);
      31 when ('Audi') output AUDI_DS2;
      32 when ('BMW') output BMW_DS2;
      33 otherwise put MAKE=MODEL=' - how did YOU get in here?';
      34 end;
      35 if LAST.MAKE then output COUNTS_DS2;
      36 end;
      37 enddata;
      38 run;
      ERROR: Compilation error.
      ERROR: BASE driver, Table CARS does not exist or cannot be accessed or created
      ERROR: Table "WORK.CARS" does not exist or cannot be accessed
      ERROR: Line 26: Unable to prepare SELECT statement for table CARS (rc=0x80fff802U).
      39 quit;

      NOTE: The SAS System stopped processing this step because of errors.
      NOTE: PROCEDURE DS2 used (Total process time):
      real time 0.20 seconds
      cpu time 0.17 seconds

    • SAS Jedi

      Robert,
      DS2 does not support concatenated SAS libraries (see Usage Note 51043: Librefs with concatenated libraries are not supported in DS2 - https://support.sas.com/kb/51/043.html)

      Try this:
      libname ds2help "!SASROOT\core\sashelp";
      proc ds2;
      data TEST;
      method run();
      set DS2HELP.CARS;
      end;
      enddata;
      run;
      quit;

      May the SAS be with you!
      Mark

  3. Thanks for the reply. The test code works fine. However, our goal was to try and use the DS2 procedure to speed up data access to some of our large datasets (100+ million obs) that were created using the SPDE libname engine. I guess it wouldn't work on these, either, since the libref is a series of concatenated paths. Do you know if this problem be addressed in a future release?

    • SAS Jedi

      SPDE libraries are not concatenated libraries. They are distributed libraries using the SPDE engine, which is a completely different beast. SPDE is explicitly listed as a supported DS2 data source and should be no problem to access using DS2. For example, this code runs flawlessly on my SAS 9.4M5 installation (Windows):
      libname slib spde 'c:\sasdata\spde\mainpath'
      datapath=('c:\sasdata\spde\data1' 'd:\sasdata\spde\data2')
      indexpath=('c:\sasdata\spde\index');

      proc ds2 libs=(slib work);
      thread comp /overwrite=yes;
      dcl double total;
      method run();
      set slib.narrow;
      by group;
      if first.group then total=0;
      total+normal;
      if last.group then output;
      end;
      endthread;
      run;

      data summary/overwrite=yes ;
      dcl thread comp c();
      method run();
      set from c threads=4;
      end;
      enddata;
      run;
      quit;

  4. Your example works, but when I try it with one of our SPDE libnames, it doesn't work.

    269
    270 libname YR4_12_1 spde 'h:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    271 datapath=('m:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    272 'p:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    273 'l:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    274 'k:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012')
    275 indexpath=('j:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012'
    276 'i:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012')
    277 bysort=no
    278 compress=binary;
    NOTE: Libref YR4_12_1 was successfully assigned as follows:
    Engine: SPDE
    Physical Name: h:\studies\PV\data\original\CCW64\2012 - new 5pct - 2012\
    279
    280
    281 proc ds2 libs=(YR4_12_1 work);
    ERROR: Client unable to establish connection.
    ERROR: The DATAPATH= option specifies physical paths, listed above, that do not exist or are
    inaccessible
    ERROR: PROC DS2 initialization failed.
    282 thread comp /overwrite=yes;
    283 dcl double total;
    284 method run();
    285 set YR4_12_1.BENEFICIARY_SUMMARY_FILE;;
    286 end;
    287 endthread;
    288 run;
    289 quit;

    NOTE: The SAS System stopped processing this step because of errors.
    NOTE: PROCEDURE DS2 used (Total process time):

  5. What I've discovered through trial & error is that a SPDE libname that has any spaces in the paths doesn't work. For example, the first libname below works fine, but the second one doesn't work.

    libname slib spde 'c:\test\'
    datapath=('c:\test\a' 'c:\test\b\')
    indexpath=('c:\test\b\');

    libname slib spde 'c:\test\'
    datapath=('c:\test\a a\' 'c:\test\b b\')
    indexpath=('c:\test\b b\');

    • SAS Jedi

      While this is a great place to discuss general or specific coding issues, it's not a great place to get help for a issues which may depend on your specific SAS installation. SAS Technical Support is the best route to a quick solution for problems like this. I'm going to open a tech support track on this myself.

      As a reminder, for urgent, business-critical issues call +1 (919) 677-8008. Otherwise, I'd suggest using the SAS Tech Support Web Form (http://support.sas.com/ctx/supportform/createForm).

      Before contacting SAS Tech Support, gather this information and have it handy:
      Your company name and the country in which you are operating
      Your name, e-mail address and phone number
      Site number, operating system and software release (see top of a fresh SAS session’s log)
      SAS product involved (for this problem, the product in Base SAS)
      Succinct one-line description of the issue (subject line)
      Detailed problem description, including ERROR OR WARNING messages
      A description of any troubleshooting or research you’ve already done
      Copies of your SAS program, SAS log & any other pertinent files.
      After submitting the web form, you'll receive an automated email with a tracking number. Reply to that email and attach copies of your SAS program and the SAS log for the session which shows the errors / problem.

      I hope this helps!
      Mark

    • SAS Jedi

      Well, it looks like we identified a bug! The response to my tech support track indicated that a fix is planned for a future release which, at the present time, is scheduled for release by the end of 2018. However, as of this writing no fix is available. And because path names cannot be modified for existing SPDE libraries (see http://support.sas.com/kb/14/028.html), no work-around is available either. To avoid this problem going forward, please ensure the paths and file names for components of SPDE libraries do not contain spaces.

Leave A Reply

Back to Top