Do your SAS programs read extra-large volumes of data? Do they run multiple DATA steps and procedures one after the other for hours at a time? Two papers from MWSUG 2013 show how you can speed up those long-running SAS jobs. Although their approaches and environments differed, both authors made use of hardware with multiple CPUs and the MP CONNECT feature of SAS/CONNECT software to run tasks in parallel. Both papers are excellent introductions to the topic.
Parallel processing can improve performance for SAS programs that are I/O intensive or can be broken into multiple, independent subtasks that can be run concurrently. According to both authors, one trick to gaining the most improvement is finding the optimum number of subtasks and how much work should occur in each one.
Simulations can take a long time, a very long time to process. When a statistician approached SAS programmer Jack Fuller for help, Fuller had to do some searching through SAS/CONNECT and SAS Grid Manager documentation to pull all the pieces together. Fuller’s first step was to move the analysis off PC SAS to the organization’s grid-enabled network. With a little work, a little benchmarking and some adjusting, Fuller helped reduce the time needed to run the analysis from approximately 5 hours to 30 minutes or less.
Fuller’s paper Beating Gridlock shared some of the lessons he learned: when to use parallel processing, how to use it and other factors to consider. He reminds us that certain sections of code may not be parallelized (such as initialization and finalization steps). They’ll become the limiting factor on the amount of speed you’ll gain. A corollary: if your program has a large amount of code that can’t be parallelized, then it may not be a candidate for this technique at all. To arrive at the optimum number of subtasks, Fuller often starts by breaking the code into 5, 10 or 15 subtasks. He then monitors performance during multiple trials to arrive at the optimum number of tasks.
In Advanced Multithreading Techniques for Performance Improvement, Viraj Kumbhakarna uses an empirical approach, basing his recommendations on a case study that he ran both serially and in parallel using the multithreading capabilities of MP CONNECT. Kumbhakarna found that, as a rule of thumb, the processing time for parallelized tasks can be improved by a factor no greater than the number of CPUs available for processing. When he broke test programs into subtasks that exceeded the number of CPUs available, processing perfomance actually degraded.
Kumbhakarna also noted that in his research, multithreading yielded higher returns where CPU time and real elapsed time were not far apart. Like Fuller, he points out that the number of threads into which jobs are broken is an important factor in performance improvement. He suggests that programmers can determine the most effective number of subtasks by executing each job multiple times and selecting the job (and its number of subtasks) with the least difference between real time and CPU time.
Both authors provide lots of sample code showing how to create test data, how to break a program into subtasks and how to submit parallelized programs for processing.
Editor’s Note: You can find this paper and more at the MWSUG 2013 Proceedings.