Do your SAS programs read extra-large volumes of data? Do they run multiple DATA steps and procedures one after the other for hours at a time? Two papers from MWSUG 2013 show how you can speed up those long-running SAS jobs. Although their approaches and environments differed, both authors made use of hardware with multiple CPUs and the MP CONNECT feature of SAS/CONNECT software to run tasks in parallel. Both papers are excellent introductions to the topic.
Parallel processing can improve performance for SAS programs that are I/O intensive or can be broken into multiple, independent subtasks that can be run concurrently. According to both authors, one trick to gaining the most improvement is finding the optimum number of subtasks and how much work should occur in each one.
Simulations can take a long time, a very long time to process. When a statistician approached SAS programmer Jack Fuller for help, Fuller had to do some searching through SAS/CONNECT and SAS Grid Manager documentation to pull all the pieces together. Fuller’s first step was to move the analysis off PC SAS to the organization’s grid-enabled network. With a little work, a little benchmarking and some adjusting, Fuller helped reduce the time needed to run the analysis from approximately 5 hours to 30 minutes or less.
Fuller’s paper Beating Gridlock shared some of the lessons he learned: when to use parallel processing, how to use it and other factors to consider. He reminds us that certain sections of code may not be parallelized (such as initialization and finalization steps). They’ll become the limiting factor on the amount of speed you’ll gain. A corollary: if your program has a large amount of code that can’t be parallelized, then it may not be a candidate for this technique at all. To arrive at the optimum number of subtasks, Fuller often starts by breaking the code into 5, 10 or 15 subtasks. He then monitors performance during multiple trials to arrive at the optimum number of tasks.
In Advanced Multithreading Techniques for Performance Improvement, Viraj Kumbhakarna uses an empirical approach, basing his recommendations on a case study that he ran both serially and in parallel using the multithreading capabilities of MP CONNECT. Kumbhakarna found that, as a rule of thumb, the processing time for parallelized tasks can be improved by a factor no greater than the number of CPUs available for processing. When he broke test programs into subtasks that exceeded the number of CPUs available, processing perfomance actually degraded.
Kumbhakarna also noted that in his research, multithreading yielded higher returns where CPU time and real elapsed time were not far apart. Like Fuller, he points out that the number of threads into which jobs are broken is an important factor in performance improvement. He suggests that programmers can determine the most effective number of subtasks by executing each job multiple times and selecting the job (and its number of subtasks) with the least difference between real time and CPU time.
Both authors provide lots of sample code showing how to create test data, how to break a program into subtasks and how to submit parallelized programs for processing.
Editor’s Note: You can find this paper and more at the MWSUG 2013 Proceedings.
4 Comments
I prefer to do most basic data processing in SAS, and I've used Ian J. Ghent's method for multi-threaded long DATA steps using a single machine. It is very effective in speeding up processing, but it's cumbersome, so I am disappointed that SAS doesn't have this functionality built in into DATA steps with a simple option like this:
I also use R for other tasks, and R's parallel processing is always easier than Ian J. Ghent's method and sometimes as simple as the suggestion above.
Andrew, in SAS 9.4, PROC DS2 can execute data processing code in parallel. You can find the documentation for threaded processing with PROC DS2 here:
http://support.sas.com/documentation/cdl/en/ds2ref/66009/HTML/default/viewer.htm#p0qykqw1fdra8dn1449vxg9ydfkk.htm
Here is a small example of starting 8 threads with "thousands of lines of code":
proc ds2; /* * Create thread program * Start in N threads with SET FROM statement */ thread work_hard / overwrite=yes; method run(); set bar; /* thousands of lines of code */ end; endthread; /* * Start thread program in 8 threads with SET FROM statement */ data foo; dcl thread work_hard t; method run(); set from t threads=8; end; enddata; run; quit;
Researchers investigating parallel processing should be aware of Amdahl's Law, which provides an upper bound for the speedup you can obtain by running an analysis on multiple processors. SAS has provided multithreaded computations for many years, and Robert Cohen's 2002 paper, "SAS Meets Big Iron," is a good starting point to estimate the potential speedup due to distributed processing.
Rick . . .
Thanks for the additional reference to Robert Cohen's paper! Fuller included a graph and explanation of Amdahl's law in his paper.
Christina