For several years, there has been interest in calling R from SAS software, primarily because of the large number of special-purpose R packages. The ability to call R from SAS has been available in SAS/IML since 2009. Previous blog posts about R include a video on how to call R from the SAS/IML language and a detailed example of calling R and importing the results back into SAS. The SAS/IML interface enables you to embed tabular output from R into SAS reports and to transfer matrices and data frames from R into SAS. You can display R graphics in the native R graphics window or tell R to write its graphics as an image file.
Recently I was asked about SAS macros that call R, such as the free %PROC_R macro by Xin Wei, or macros that were written by Phil Holland or Phil Rack. In addition to these general-purpose macros, several programmers have described how to call R from SAS for special purposes. See the examples by Charlie Huang on his SAS Analysis blog and by Liang Xie on his blog. These macros do not require using the SAS/IML language, so they might be a reasonable choice for SAS customers who do not have a license for the SAS/IML product.
The %PROC_R macro, which is described by Wei (2012, p. 8), is typical of a macro that calls R. It implements the following steps:
- The SAS DATA step writes an R script to a text file.
- Any SAS data sets that should be made available to R are converted to a form that is readable by R, such as a CSV file. (Other macros might try to read the SAS file directly.)
- The macro calls the R executable to execute the R file in batch mode, either by using a pipe or by using a SAS statement that calls the operating system, such as the X statement or the %SYSEXEC statement.
- Optionally, R results are made available to SAS, usually by writing them a CSV file.
- Optionally, R output is sent to a SAS Output window and error information is sent to the SAS Log.
What is the value of the SAS/IML interface to R?
In light of these macros, why should you use the SAS/IML interface to R? The answer is that the SAS/IML interface contains many features that are not available if you use a SAS macro to call R. Here are a dozen advantages to using the SAS/IML interface to call R:
- When you call R from the SAS/IML language, the R session persists until you quit PROC IML. The R session—including all existing R functions and variables—remains active, which means that you can call R multiple times or even call R within a SAS/IML DO loop.
proc iml; submit / R; x <- 1; endsubmit; do i = 1 to 10; /* call R in a loop; variables persist between calls */ submit / R; x <- 2*x; endsubmit; end; run ImportMatrixFromR(TwoPower10, "x"); print TwoPower10;
This feature is very powerful. It means that at the top of your SAS/IML program you can load your favorite R packages and functions (perhaps by using a single %INCLUDE call) and they will be available when you call R. The looping feature means that you can apply an R method to many data sets or vectors, or you can write an iterative algorithm in SAS/IML that calls R during each step of the iteration. The first call to R launches the R process; subsequent calls use the process that is already running. This is in stark contrast to the macro approach, for which each call to R starts and exits R, which leads to a lot of "overhead" costs. Consequently, if you need to call R multiple times, the SAS/IML interface has much better performance.
- Missing values are automatically converted from SAS to R and from R to SAS. The following SAS/IML session sends a vector to R that contains missing values. It also reads an R array that contains R missing values. The printed output shows that all missing values are handled in a robust way.
x = {0 1 . .I .M}; /* .I --> Inf; .M --> -Inf */ run ExportMatrixToR(x, "Rx"); submit / R; print( Rx ) Ry = t(c(0,1,NA,Inf,-Inf,NaN)) # NaN --> . endsubmit; run ImportMatrixFromR(y, "Ry"); print y;
- You can pass parameters from SAS to R. This is very useful because it enables you to communicate options such as the name of the analysis variables to R. Last week I showed how to use this feature to construct a general-purpose SAS/IML module that passes arguments to R.
- You can call R and then resume execution of a SAS/IML program. Because you can also call SAS procedures and DATA steps from within the SAS/IML language, the SAS/IML language can serve as "glue" to drive an analysis that uses SAS/IML functions, SAS procedures, and R functions.
- You can easily transfer data in both directions. You can do this multiple times within your program, and you have complete control over the sequence of transfers. The interactive SAS/IML language enables you to dynamically specify the names of data frames and matrices at run time. Furthermore, the SAS/IML interface does not use CSV files to transfer data. Because CSV files are slow to read and write, the SAS/IML interface is faster at transferring data than a macro that uses CSV files as an intermediate format. The SAS/IML interface is also more accurate, since it reads and writes binary values rather than converting between text and binary.
- Date, time, and datetime values are automatically converted.
run ExportDataSetToR("Sashelp.Air", "Air"); run ExportDataSetToR("Sashelp.TimeData", "TD"); submit / R; class(Air$DATE) # assigned to 'Date' class class(TD$datetime) # assigned to 'POSIXt' and 'POSIXct' classes endsubmit;
- The ImportDataSetFromR call automatically converts R names to valid SAS names. (R permits names of variables that are not valid variable names in SAS.)
- In contrast to the SAS macros, the SAS/IML interface to R does not use the X statement to issue commands to the operating system. The programs you write in the SAS/IML language are portable: they run on any operating system that supports both SAS and R. This is important because some SAS administrators disable calls to the operating system from within SAS. If this is the case at your site, the macros that use the X statement or pipes will not work.
- In the SAS/IML Studio environment, SAS and R can run on different machines. You can run SAS on a huge server and run R on your local PC. Even though the two software packages are running on different machines, the functions for data transfer work without modification. This can be an advantage for researchers who work for a large corporation. The macro approach requires that R be installed on the SAS server and that the installation of R contain all packages that might be used by any analyst. In contrast, SAS/IML Studio enables each analyst to install a local copy of R with only the packages that he or she needs. Analysts can upgrade their version of R or their collection of packages at any time without disrupting the work of their colleagues.
- In many situations, you can interrupt a long-running R computation by clicking on the usual "Break" icon on the SAS GUI toolbar. This interrupts the R computation without killing the entire SAS process. After you regain control of the program, you can save your program, modify it, and resubmit it.
- Along the same lines, the OK= option in the SUBMIT statement enables you to handle errors that occur in R. Depending on the severity of the error, you can either continue processing or choose to write an error message and abort the program. For a simple example, the following program creates an intentional error in the R program and prints an error message:
submit / R OK=isOK; y <- VarDoesNotExist # cause an error endsubmit; if ^isOK then do; free Answer; /* set Answer to empty matrix */ print "An error occurred in the R computation"; end; else run ImportMatrixFromR(Answer, "y"); /* retrieve answer from R */
- As with all SAS-supported features, free technical support is just a phone call or email away. If you are confused or your program is not behaving in the way that you expect, call SAS Technical Support for assistance.
So there you have it, a dozen reasons to use the SAS/IML interface to R. Although it is theoretically possible for a SAS macro to add features such as converting missing values, the most powerful features (passing parameters and calling R within a SAS/IML DO loop) are unique to the SAS/IML interface. The SAS/IML interface to R greatly enhances the ways that SAS and R can work together.
7 Comments
Rick:
I particularly agree with you on your 1st point. The macro approach executes the whole job from start to end. If you are to make a minor change in your R code, you have to re-do the whole analysis, which could be frustrating if you are dealing with a large workflow. I mentioned this limitation at the end of paper. btw, it is always pleasure to read your blog. Keep up the good work!
thanks
Rick and Xin Wei,
Thank you both for the tip on how to pass R data to SAS without SAS/IML. I will give it a try .
Rick,
I have been trying to sell CDC on the merits of the SAS/R interface.
I have found that it work as you have described in your blogs.
However I have found a 'Gotcha'.
The problem is that the memory allocated to Web base R studio environment usually exceeds what is available to a desktop SAS/R interface.
As a result, it can appear that the Web based R studio can read a bigger file and processing faster than a SAS/R setup.
Can you describe to me what you have to do to increase the memory and other resources so in SAS 9.4 so a SAS/R program can approach the performance of a Web Based R Studio program.
This would also apply to a Server/Web based SAS.
Thanks,
Paul C. McMurray
Analytic Tools and Methods Branch
Division of Epidemiology, Analysis, and Library Services
Center for Surveillance, Epidemiology, and Laboratory Services
Centers for Disease Control and Prevention
MS E-33
I don't fully understand your setup; for best advice, contact SAS Technical Support and describe your configuration. You don't mention SAS/IML Studio, so I assume you are calling R from PROC IML.
When you call R from PROC IML, you are calling R on the same machine that SAS is running on. You say "desktop SAS/R" so I assume you have SAS and R installed on a PC. In that case, SAS and R are both using (and competing for) the same RAM. Modern PCs often support up to 64GB of RAM, and each 16GB of RAM is really cheap (~$75), so check your available RAM and request more RAM if you have less than 32GB. In SAS, you can use the MEMSIZE option to control how much RAM is available for SAS procedures such as IML.
You also mention "Web based R Studio," which I assume is connected to a server somewhere that is running R. The most likely explanation for the difference you are seeing is that the R server is more powerful and has more RAM than the desktop PC.
I wonder is it possible to send R stdout to the SAS log while an R submit block is running. I have an R function that outputs e.g. "10%", "20%", "30%" to track its progress. This output ends up in the SAS display window only after the submit block has ended. Originally I had the simulation in IML and could direct such messages to the log (while the program ran) using file log.
Thanks
I am not aware of any way to send R output to the SAS log in real time. However, I have previously written about how to monitor the progress of a long-running SAS/IML program. Because you can call the SUBMIT block in a loop you could do something like this:
Pingback: Learning SAS programming for R users - The SAS Dummy