Jedi SAS Tricks: Warp Speed DATA Steps with DS2

17

I remember the first time I was faced with the challenge of parallelizing a DATA step process. It was 2001 and SAS V8.1 was shiny and new. We were processing very large data sets, and the computations performed on each record were quite complex. The processing was crawling along on impulse power and I felt the need - the need for warp speed!

From the SAS log we could see that elapsed time was almost exactly equal to CPU time, so we surmised that the process was CPU bound. So with SAS/CONNECT licensed on our well-provisioned UNIX SAS server and an amazing SUGI paper extolling the virtues of parallel processing with MPCONNECT in hand, we set out chart a course in this brave, new world. The concept behind MPCONNECT is to write a SAS control program that breaks your data up into smaller pieces, spawns several identical DATA step jobs to process the pieces in parallel, monitors progress until they all finish, then reassembles the individual outputs to obtain the final results. Labor intensive, for sure, but it definitely accelerated processing of CPU-bound jobs.

But now I have SAS9.4 with the new DS2 programming language. This was built from the ground up with threading in mind - and suddenly parallel processing with the DATA step just became a whole lot easier! For example, here is a (senseless, I’ll admit) CPU intensive base SAS DATA step program:

data t1;
   array score[0:100];
   set t END=LAST;
   do i=LBOUND(SCORE) to hbound(score);
      Score[i]= (SQRT(((id * ru * rn) / (id + rn + ru))*ID))*
                (SQRT(((id * ru * rn) / (id + rn + ru))*ID));
   end;
   count+1;
   if last then put 'Data step processed ' count 'observations.';
   drop i count;
run;

When executed, this process consumes about the same amount of CPU time as elapsed time:

NOTE: DATA statement used (Total process time):
      real time           5.20 seconds
      cpu time            5.11 seconds

I suspect the process is CPU bound and could benefit from threading. First, I’ll try this as a straight DS2 DATA step:

proc ds2;
data t2/overwrite=yes;
   dcl bigint count;
   drop count;
   vararray double score[0:100] score0-score100;
   method run();
      dcl int i;
      set t;
      do i=LBOUND(SCORE) to hbound(score);
         Score[i]= (SQRT(((id * ru * rn) / (id + rn + ru))*ID))*
                   (SQRT(((id * ru * rn) / (id + rn + ru))*ID));
      end;
      count+1;
   end;
   method term();
      put 'DS2 Data step processed' count 'observations.';
   end;
enddata;
run;
quit;

This process is still running single-threaded, and uses about the same resources and elapsed time as the original, with a little extra (as expected) for the PROC overhead:

NOTE: PROCEDURE DS2 used (Total process time):
      real time           5.98 seconds
      cpu time            5.86 seconds

Now, let’s convert the process to a thread. First we create the THREAD program, which will be stored in a SAS library. I’m going to store it in WORK in this case. To convert the DS2 DATA step to a THREAD step, I'll simply change the DATA statement to a THREAD statement and the ENDDATA statement to ENDTHREAD:

proc ds2;
thread th2/overwrite=yes;
   dcl bigint count;
   drop count;
   vararray double score[0:100] score0-score100;
   method run();
      dcl int i;
      set t;
      do i=LBOUND(SCORE) to hbound(score);
         Score[i]= (SQRT(((id * ru * rn) / (id + rn + ru))*ID))*
                   (SQRT(((id * ru * rn) / (id + rn + ru))*ID));
      end;
      count+1;
   end;
   method term();
      /*Make each thread report how many obs processed*/
      put 'Thread' _threadid_ ' processed' count 'observations.';
   end;
endthread;
run;
quit;

Executing that program creates the thread and stores it in the WORK library in a dataset named th2. Now to write a short DATA step program to execute 4 of the threads in parallel:

proc ds2;
/*Multi-threaded*/
data th4/overwrite=yes;
   dcl thread th2 t;
   method run();
   set from t threads=4;
   end;
enddata;
run;
quit;

And the clock time is significantly reduced, at the expense of extra CPU time. Note that the CPU time is longer than the elapsed time indicating operations were conducted in parallel. The routine in the thread’s TERM method reports how many observations each thread processed.

Thread 3  processed 281152 observations.
Thread 2  processed 219648 observations.
Thread 1  processed 294528 observations.
Thread 0  processed 204672 observations.
NOTE: PROCEDURE DS2 used (Total process time):
      real time           3.20 seconds
      cpu time            9.20 seconds

Our threaded process cut the elapsed time almost in half!

That’s all I have for this time. As usual, you can download a ZIP file containing a copy of this blog entry and the code use to create it from this link.

Now I’m off to participate in SAS Global Forum 2015 in Dallas. There are tons of presentations that talk about DS2, SAS in-database processing and using SAS with Hadoop. Look me up! I can be found at the #SASGF15 #TweetUp Saturday night, attending various presentations (especially about DS2 and Hadoop), or hanging out in the Quad on Tuesday afternoon from 2 to 2:30 pm to answer you questions about SAS Foundation programming or DS2. I'm also teaching the post-conference DS2 Programming Essentials class at the conference center. So, I hope to see you there.

Until next time, may the SAS be with you!
Mark

Share

About Author

SAS Jedi

Principal Technical Training Consultant

Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. After 20 years as a US Navy submariner pursuing his passion for programming as a hobby, in 1994 he retired, turned his hobby into a dream job, and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS programming classes, and his book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques" is in its second edition. When he isn’t writing, teaching, or posting “Jedi SAS Tricks”, Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach, and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in historic Williamsburg, VA with his wife, Lori, and Stella, their cat. To connect with Mark, check out his SAS Press Author page, follow him on Twitter @SASJedi or connect on Facebook or LinkedIn.

Related Posts

17 Comments

  1. Well, actually I meant "9 hour" not "hour". Turning an hour job into a 3 hour job is hardly to parallel processing's credit. In other words, what we did was to turn a 9 hour job into a 3 hour job, with the primary read going from 4 hours to less than 45 minutes.

    While I got parallel processing to work, apparently I haven't yet mastered the art of the keyboard.

    Jim

    • SAS Jedi

      Thanks for sharing your paper here, Jim. It's the clearest guide to multi-threading in BASE SAS without SAS/CONNECT I've ever seen. I particularly liked how you state that you're not solving for system efficiency - you're solving for SPEED. In today's world, I think speed is really what it's all about. I remember doing similar things to mitigate super long run times when I worked at the bank.

      That said, I'm super stoked about the emergence of new SAS parallel processing technologies like DS2 and the SAS Viya Cloud Analytic Server (CAS), which take a lot of the work out of this parallel processing stuff while making our SAS code run like greased lightening :-)

      May the SAS be with you!
      Mark

  2. Jim Barbour on

    Dear Mark,

    That graphic is exceptionally clear and helpful. Thank you for that, but, rats!, I was hoping DS2 would help me in my specific case. I'm reading very large plain text files consisting of up to one billion observations (although 50 to 100 million obs is the more typical case). Once the text is read into SAS data sets, I then perform certain calculations and manipulations, none of which are particularly CPU intensive. Not a good case for DS2 per your graphic.

    I have some half-formed ideas about using the input process to write to not one but four SAS datasets, each containing roughly 25% of the whole. I would then manually (ugh) run four simultaneous SAS jobs to do the processing, hopefully reducing the processing time by up to 75%. The results would then be concatenated, manually, into a final results set. Since the results are an aggregation, the final concatenation should be relatively fast.

    We are in the process of an RDBMS implementation, so I'll definitely keep the knowledge of multiple read threads in mind.

    Jim

    • SAS Jedi

      Sigh... Your problem sounds like a most excellent use case for the SAS/Access to Hadoop with the In-Database Code Accelerator. Then we could just drop your text file in Hadoop, create a Hive descriptor and do some massively parallel processing with DS2!

      For your planned approach, if you have SAS/Connect licensed it might be easier than you think! You'll want to use MPCONNECT techniques to get the DATA steps to run in parallel on the same SAS server. PROC SCAPROC can be very useful in helping to "parallelize" your SAS program. I've uploaded a sample program and output ZIP file to give you some more ideas :-)

      May the SAS be with you!
      Mark

      • Jim Barbour on

        Mark,

        We are actively headed in that direction (Hadoop/Hive), but I'm imagining that to be a year, maybe two, away. I'd be curious how long it takes to "drop" the text into Hadoop and create a Hive descriptor, but boy is that MPP appealing.

        We don't have SAS/Connect, at least on the UNIX box I typically use (possibly it's on our mainframe), but let me look at the various examples you've given of MPCONNECT techniques. If it looks somewhat less than terrifying, I may propose it as an interim solution (assuming the licensing cost is not overly prohibitive). We're in the process of rolling out SAS 9.4, EG 7.1x, and implementing SAS Enterprise Miner, so perhaps my timing is good.

        Thank you! (very much) for the suggestions and code. Very much appreciated,

        Jim

  3. Jim Barbour on

    Dear Mark,

    I've seen a number of references to DS2 being ideal for "CPU bound" processes. What about processes that simply have very large numbers of observations? Just sitting here on the very edges of knowledge of DS2 (I've now figured out how to spell DS2), it would seem that DS2 might work well for processing SAS data sets with, say, 100 million+ observations. As it is now with a traditional DATA step, it takes multiple hours to process some of our SAS data sets. Checking the log reveals that clock-on-the-wall time is consistently greater than CPU time. It would seem that our jobs are data, not CPU, bound.

    Would it not be faster in terms of clock-on-the-wall time to use DS2? I would think that the overall time of the job could be shortened by breaking a 100 million observation SAS data set into chunks of, say, about 25 million observations each, said processing said chunks in parallel, and the results compiled therefrom after all chunks have finished processing. Am I off in my thinking here? Are the advantages of DS2 restricted to only CPU bound processes?

    Thank you,

    Jim

    • SAS Jedi

      Jim,
      Thanks for taking the time to ask an interesting question :-) When threading on the SAS platform, the DS2 process is fed by a single read thread to ensure the data blocks are properly distributed to the threads (all blocks are used, and none repeated). As a result, a process that is I/O bound will not profit from threading on the SAS platform. However, if you are processing RDBMS data from a supported platform with the DS2 In-Database Code Accelerator installed, the code can be inserted into the RDBMS to run fully distributed there. I hope this illustration helps:
      DS2 Anatomy

      Bottom line: having more CPU waiting to get the net record won't help if the latency is due to delivering the next record for processing.
      May the SAS be with you!
      Mark

  4. Pingback: Jedi SAS Tricks - Maximum Warp with Hadoop - The SAS Training Post

  5. Hi Mark,

    This post inspired me to try my hand at processing with DS2. I've had some success but am running into a problem I can't quite figure out -- and the error message isn't pointing me anywhere useful.

    Using DS2 is new to me, so I'm starting off simple, with a data set that contains an ID and two date fields (from and thru). I want to check that from_date is greater than a hard-coded date, for example: from_date > 1/1/2015. To do this, I'm doing the following:


    declare double dt_min;
    dt_min = inputn('01/01/2015', 'mmddyy10.');

    if from_date > dt_min then output;

    This works without error when I use just a data step within DS2. I'm now trying to create a thread to do the operation and am getting this error:


    ERROR: Unexpected error detected in function inputn.

    Additionally, the "NOTE: Execution succeeded. ##### rows affected" message in the log has a variable ##### each time I run the program. It seems like that might indicate a resource issue? I'd appreciate any feedback! The above error isn't helping me too much right now. Thank you!

    • SAS Jedi

      What version of SAS are you running? I am running SAS9.4M2 and I'm not able to reproduce this error. Here's what I tried:

      /* First make some data to play with */
      data test;
      call streaminit(12345);
      do id=1 to 5;
      from_date =ceil(rand('NORMAL',20050,90));
      to_date =from_date+ceil(rand('UNIFORM')*5);
      output;
      end;
      run;
      proc print;
      id id;
      run;
      /* Now subset and format in DS2 */
      proc ds2;
      data test2/overwrite=yes;
      declare double from_date to_date dt_min having format mmddyy10.;
      retain dt_min;
      method init();
      dt_min = inputn('01/01/2015', 'mmddyy10.');
      end;
      method run();
      set test;
      if from_date > dt_min then output;
      end;
      enddata;
      run;
      quit;

      proc print data=test2;
      id id;
      run;

      This produced the expected results for me without any warnings or errors.

      • I think you have me on the right track, thank you!

        I had the dt_min = inputn('01/01/2015', 'mmddyy10.'); within a user-defined method that was being called from the run() method. I also did not have a retain statement. I moved that snipped into the init() method and added a retain statement and now my log is error-free.

Back to Top