Last week, SAS released the 14.1 version of its analytics products, which are shipped as part of the third maintenance release of 9.4. If you run SAS/IML programs from a 64-bit Windows PC, you might be interested to know that you can now create matrices with about 231 ≈ 2 billion elements, provided that your system has enough RAM. (On Linux operating systems, this feature has been available since SAS 9.3.)
A numerical matrix with 2 billion elements requires 16 GB of RAM. In terms of matrix dimensions, this corresponds to a square numerical matrix that has approximately 46,000 rows and columns. I've written a handy SAS/IML program to determine how much RAM is required to store a matrix of a given size.
If you are running 64-bit SAS on Windows, this article describes how to set an upper limit for the amount of memory that SAS allocate for large matrices.
The MEMSIZE option
The amount of memory that SAS can allocate depends on the value of the MEMSIZE system option, which has a default value of 2GB on Windows. Many SAS sites do not override the default value, which means that SAS cannot allocate more than 2 GB of system memory.
You can run PROC OPTIONS to display the current value of the MEMSIZE option.
proc options option=memsize value; run;
Option Value Information For SAS Option MEMSIZE Value: 2147483648 Scope: SAS Session How option value set: Config File Config file name: C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg
The value 2,147,483,648 is shown in the SAS log. The value is unfortunately in bytes. This number corresponds to 2 GB. Unless you change the MEMSIZE option, you will not be able to allocate a square matrix with more than about 16,000 rows and columns. For example, unless SAS can allocate 5 GB or more of RAM, the following SAS/IML program will produce an error message:
proc iml; /* allocate 25,000 x 25,000 matrix, which requires 4.7 GB */ x = j(25000, 25000, 0);
ERROR: Unable to allocate sufficient memory.
You can use the MEMSIZE system option to permit SAS to allocate a greater amount of system memory. SAS does not grab this memory and hold onto it. Instead, the MEMSIZE option specifies a maximum value for dynamic allocations.
The MEMSIZE option only applies when you launch SAS, so if SAS is currently running, save your work and exit SAS before continuing.
Changing the command-line invocation for SAS
If you run SAS locally on your PC, you can add the -MEMSIZE command-line option to the shortcut that you use to invoke SAS. This example uses "12G" to permit SAS to allocate up to 12 GB of RAM, but you can use different numbers, such as 8G or 16G.
- Locate the "SAS 9.4" icon on your Desktop or the "SAS 9.4" item on the Start menu.
- Right-click on the shortcut and select Properties
- A dialog box appears. Edit the Target field and insert -MEMSIZE 12G at the end of the current text, as shown in the image.
- Click OK.
Every time you use this shortcut to launch SAS, the SAS process can allocate up to 12 GB of RAM. You can also specify -MEMSIZE 0, which permits allocations up to 80% of the available RAM. Personally, I do not use -MEMSIZE 0 because it permits SAS to consume most of the system memory, which does not leave much for other applications. I rarely permit SAS to use more than 75% of my RAM.
After editing the shortcut, launch SAS and call PROC OPTIONS. This time you should see something like the following:
Option Value Information For SAS Option MEMSIZE Value: 12884901888 Scope: SAS Session How option value set: SAS Session Startup Command Line
SAS configuration files
A drawback of the command-line approach is that it only applies to a SAS session that is launched from the shortcut that you modified. In particular, it does not apply to launching SAS by double-clicking on a .sas or .sas7bdat file.
An alternative is to create or edit a configuration file. The SAS documentation has long and complete instructions about how to edit the sasv9.cfg file that sets the system options for SAS when SAS is launched.
SAS 9 creates two default configuration files during installation. Both configuration files are named SASV9.CFG. I suggest that you edit the one in !SASHOME\SASFoundation\9.4, which on many installations is c:\program files\SASHome\SASFoundation\9.4. By default, that configuration file has a -CONFIG option that points to a language-specific configuration file. Put the -MEMSIZE option and any other system options after the -CONFIG option, as follows:
-config "C:\Program Files\SASHome\SASFoundation\9.4\nls\en\sasv9.cfg" -RLANG -MEMSIZE 12G
Notice that I also put the -RLANG option in this sasv9.cfg file. The -RLANG system option specifies that SAS/IML software can interface with the R language.
If you now double-click on a .sas file to launch SAS, PROC OPTIONS reports the following information:
Option Value Information For SAS Option MEMSIZE Value: 12884901888 Scope: SAS Session How option value set: Config File Config file name: C:\Program Files\SASHome\SASFoundation\9.4\SASV9.CFG
If you add multiple system options to the configuration file, you might want to go back to the SAS 9.4 Properties dialog box (in the previous section) and edit the Target value to point to the configuration file that you just edited.
Remote SAS servers
If you connect to a remote SAS server and submit SAS/IML programs through SAS/IML Studio, SAS Enterprise Guide, or SAS Studio, a SAS administrator has probably provided a configuration file that specifies how much RAM can be allocated by your SAS process. If you need a larger limit, discuss the situation with your SAS administrator.
Final thoughts on big matrices
You can create SAS/IML matrices that have millions of rows and hundreds of columns. However, you need to recognize that many matrix computations scale cubically with the number of elements in the matrix. For example, many computations on an n x n matrix require on the order of n3 floating point operations. Consequently, although you might be able to create extremely large matrices, computing with them can be very time consuming.
In short, allocating a large matrix is only the first step. The wise programmer will time a computation on a sequence of smaller problems as a way of estimating the time required to tackle The Big Problem.