When SAS is used for analysis on large volumes of data (in the gigabytes), SAS reads and writes the data using large block sequential IO. To gain the optimal performance from the hardware when doing these IOs, we strongly suggest that you review the information below to ensure that the infrastructure (CPUs, memory, IO subsystem) are all configured as optimally as possible.
Operating-system tuning. Tuning Guidelines for working with SAS on various operating systems can be found on the SAS Usage Note 53873.
CPU. SAS recommends the use of current generation processors whenever possible for all systems.
Memory. For each tier of the environment, SAS recommends the following minimum memory, guidelines:
- SAS Compute tier: A minimum of 8GB of RAM per core
- SAS Middle tier: A minimum 24GB or 8GB of RAM per core, whichever is larger
- SAS Metadata tier: A minimum of 8GB of RAM per core
It is also important to understand the amount of virtual memory that is required in the system. SAS recommends that virtual memory be 1.5 to 2 times the amount of physical RAM. If, in monitoring your system, it is evident that the machine is paging a lot, then SAS recommends either adding more memory or moving the paging file to a drive with a more robust I/O throughput rate compared to the default drive. In some cases, both of these steps may be necessary.
IO configuration. Configuring the IO subsystem (disks within the storage, adaptors coming out of the storage, interconnect between the storage and processors, input into the processors) to be able to deliver the IO throughput recommended by SAS will keep the processor busy, allow the workloads to execute without delays and make the SAS users happy. Here are the recommended IO throughput for the typical file systems required by the SAS Compute tier:
- Overall IO throughput needs to be a minimum of 100-125 MB/sec/core.
- For SAS WORK, a minimum of 100 MB/sec/core
- For permanent SAS data files, a minimum of 50-75 MB/sec/core
For more information regarding how SAS does IO, please review the Best Practices for Configuring your IO Subsystem for SAS® 9 Applications (Revised May 2014) paper.
IO throughput. Additionally, it is a good idea to establish base line IO capabilities before end-users begin placing demands on the system as well as to support monitoring the IO if end-users begin suggesting changes in performance. To test the IO throughput, platform specific scripts are available:
- Testing Throughput for your SAS 9 File Systems: UNIX and Linux platforms
- Testing Throughput for your SAS 9 File Systems: Microsoft Windows platforms
File system. The Best Practices for Configuring IO paper above lists the preferred local file systems for SAS (i.e. JFS2 for AIX, XFS for RHEL, NTFS for Windows). Specific tuning for these file systems can be found the above operating system tuning papers.
For SAS Grid Computing implementations, a clustered file system is required. SAS has tested SAS Grid Manager with many file systems, and the results of that testing along with any available tuning guidelines can be found in the A Survey of Shared File Systems (updated August 2013) paper. In addition to this overall paper, there are more detailed papers on Red Hat’s GFS2 and IBM’s GPFS clustered file systems on the SAS Usage Note 53875.
Due to the nature of SAS WORK (the temporary file system for SAS applications), which does large sequential reads and writes and then destroys these files at the termination of the SAS session, SAS does not recommend NFS mounted file systems. These systems have a history of file-locking issues on NFS systems, and the network can negatively influence the performance of SAS when accessing files across it, especially when doing writes.
Storage array. Storage arrays play an important part in the IO subsystem infrastructure. SAS has several papers on tuning guidelines for various storage arrays, through the SAS Usage Note 53874.
Miscellaneous. In addition to the above information, there are some general papers on how to setup the infrastructure to best support SAS, these are available for your review:
- Grand Designs: Why It Pays to Think About Technical Architecture Design Before You Act
- How to Maintain Happy SAS®9 Users (revised June 2014)
- A Guide to SAS® for the IT Organization
- Top 10 Resources Every SAS® Administrator Should Know About
- Guidelines for Preparing your Computer Systems for SAS
- SAS Administrators Blog
Finally, SAS recommends regular monitoring of the environment to ensure ample compute resources for SAS. Additional papers are available that provide guidelines for appropriate monitoring. These can be found on the SAS Usage Note 53877.
2 Comments
Anybody out there use a high capacity laptop to process extremely large datafiles using SAS? I'm being told by IT that no such thing exists, but the alternative they are offering is what I currently have which has very slow run times to complete. I've worked through SAS tech support but am told to talk to our IT folks. I talk to our IT folks and am told its the best they can do....provided them all the excellent guidance that SAS folks have written, but I'm hitting a brick wall. Would appreciate any advice out there from SAS users who either use a laptop or a desktop (secured/encrypted/etc) to process large amounts of data (max file about 5 GB, 20 millions rows and 500 columns). I do select and reduce the data early on, but that process alone takes half a day. I'm looking to speed that up. Thanks for any help and advice. I realize we probably just need better IT support but as a small organization we're often stuck with what we have.
I would venture to guess that you do not have enough IO throughput and/or memory with the single hard drive that you have in your laptop to perform your task in the time frame you would like for it to. Have you looked at attaching some SSD drives via USB ports to your laptop to add additional IO throughput?