Kirk Paul Lafler, Software Intelligence Corporation, has written four SAS books and more than 500 peer-reviewed papers - 19 of which were awarded Best Contributed Papers or Poster, so I’m going to believe him when he says that he’s figured out a thing or two about tuning SAS systems.
Lafler says that there are many more SAS system and performance tuning techniques, but his NESUG paper captures his top ten in each of these five categories:
- CPU techniques - time used to decode and execute programs.
- I/O – time expended to perform read/write operations.
- Memory - location data is stored while being processed.
- Secondary storage - where data is kept (thumb drive, disk drive, optical drive, tape drive ).
- Programming – this human element puts the techniques you have acquired into practice.
In his paper, Lafler writes that there are many actions that can reduce efficiency, including keeping unwanted datasets in the work space; not subsetting early to eliminate undesirable observations; and reading wanted and unwanted variables.
As a baseline, you can output information to your log that will give you a view into the resources you are using. Lafler suggests the STIMER and FULLSTIMER options. STIMER gives you the real time and CPU time, and FULLSTIMER gives real-time, user CPU time, system CPU time, operating system memory and timestamp. “If you need to be able to document, or better understand, what the resources were that expended to run your job or multistep job, these are good ways to get a real-time estimate of what the job used,” explained Lafler.
Lafler said that the top-ten lists in the paper are not necessarily by order of his favorite; he chose them to “give you the biggest bang for the buck.” His top ten in each category can be found in the paper, so I’m not going to steal his thunder. Here’s a quick peek at one of the techniques that Lafler shared.
A CPU technique is the KEEP= data set option. “The KEEP= data set option reduces the width of the data set to only those columns or variables that you are interested in,” he said. Using the KEEP= data set option instructs the SAS System to load only the specified variables into the program data vector (PDV), eliminating all other variables from being loaded.
“We’re talking about using it – preferably on the set, the read process of the data step. You specify only those variables that you want in this process. You can use the DROP=, you aren’t telling the user what variables are going to be used,” said Lafler.
He says to keep in mind that there may be tradeoffs. “If you improve one aspect of your resource utilization, you might offset another. Oftentimes, you have to decide which resource area is in most demand or in shortest supply for your organization or your department,” he said.
What performance tips do you have?