With memory being affordable now, we are constantly being asked by customers about doubling and tripling the amount of RAM that SAS recommends. More is better, right?
Often, but we have found a specific scenario using SAS datasets where that is not the case. Remember, that increasing memory generally means a commensurate increase not only in operational and computation memory, but in the host system file cache as well. The host system file cache is the portion of memory where SAS pages all of its READS and WRITES to and from storage.
Consider this quick example that arose recently. A customer had a host with a lot of memory, and hence, hundreds of Gigabytes of host system file cache, all to himself. This was a quiet system in which he enjoyed the spoils of excess.
Here is a quick example of what can backfire with populating a large host cache with very large SAS files. Consider the SAS program:
DATA newds; SET testds; RUN; DATA newds; SET newds; RUN;
The first DATA step creates a new file, by setting an existing file. In the second DATA step, we are updating the dataset, newds, “in place.” The second DATA step in this case runs significantly longer than the first DATA step which set the dataset into a new file, even though they appear to be doing the same type of operation. The reason the second DATA step takes longer is the newds.sas7bdat.lck file that is created in SAS WORK by the DATA statement, cannot just be committed and closed as newds.sas7bdat until all the data associated with the original SAS data set newds.sas7bdat in the SET statement, has been flushed from file cache (i.e. RAM). The exact same pages of the original newds.sas7bdat residing in the host cache, are not being updated, they are being used to create a copy of that file into the new locked file newds.sas7bdat.lck. So we can’t commit the new file, until the pages from old file with the same name is flushed from host cache, and the original file deleted on storage.
If this file is 100s of Gigabytes in size, and most of its pages reside in the host system file cache, this flush can take a considerable amount of time, much longer than just a rename of the file like the first DATA step above, for instance. In the second DATA step, the original file must be emptied from cache by the page flush deamons, and on storage, to be replaced by the newds.sas7bdat.lck version before it can be closed and committed to storage.
So, very large SAS data files that fit into host system file cache, and have to be flushed before SAS can updated that file with the same name, can lead to much longer response times for that operation. This delay is commensurate with the size of the file and how many of its pages reside in the host cache. Please be aware it is generally not a good idea to update a file “in place,” e.g. update a file with the same name, for very large files, to avoid this type of behavior.