How to use Git to share SAS programs

3

This article is about how to use Git to share SAS programs, specifically how to share libraries of SAS IML functions. Some IML programmers might remember an earlier way to share libraries of functions: SAS/IML released "packages" in SAS 9.4m3 (2015), which enable you to create, document, share, and use libraries of IML functions. Unfortunately, the "package" mechanism for SAS 9.4 assumes that programmers can install files on their local SAS workspace server, which is often running on a desktop or laptop PC. This mechanism does not work well for SAS Viya, which runs SAS "in the cloud" on a remote server that is deployed and maintained by an administrator.

Recently, I have been experimenting with using Git to share code with others in a way that will work on SAS Viya as well as on SAS 9.4. Git is not new. It is also not new to many SAS programmers. In fact, for several years Chris Hemedinger has been promoting the use of Git to manage SAS projects by writing blog posts and giving talks at SAS User Groups. The SAS language has supported function that interact with Git since SAS 9.4M6. Many interfaces to SAS, including SAS Studio and SAS Enterprise Guide, provide GUI support for Git operations.

Since I am a programmer, I will show how to use Git function in the DATA step to download and use a SAS IML function that I created in a recent blog post. There are differences between the Git functions in SAS 9.4 and the Git functions in SAS Viya, so I will show both techniques. For this article, I only need one function because my goal is to copy ("clone") files from a remote Git repository.

The task: Download a library of SAS IML functions

The remainder of this article shows how to copy the GitHub repository (a "repo") to a specified location, then use a %INCLUDE statement to include a file into a PROC IML program. On SAS 9.4, I download the repo to my PC. On SAS Viya in the cloud, I download the repo to the WORK libref. The technique in this article is based on Chris Hemedinger's guide to using Git in SAS, which has additional details.

To demonstrate this technique, I show how to include a file that defines the PrintToLog subroutine. The PrintToLog subroutine has been part of SAS IML on SAS Viya since Viya 3.5, but it is not supported in SAS 9.4. However, I like to use the subroutine, so I wrote a module named PrintToLog that reproduces the functionality in SAS 9.4. I wrapped the module in a macro that detects whether the program is running on SAS 9.4 or SAS Viya. On SAS 9.4, the macro defines the module. On SAS Viya, the macro does nothing. Thus, in any version of SAS, you can call the macro and then call the PrintToLog subroutine.

I uploaded the module to a GitHub repository for my blog. You can look at the file that defines the PrintToLog subroutine.

Download functions from GitHub in SAS 9.4

In SAS 9.4, you can copy the files from the Git repo onto your PC or to any directory that SAS can write to. I chose the location C:\Downloads\BlogRepo. I turn on the DLCREATEDIR option to tell SAS that it should create the directory if it does not already exist. If you haven't used Git before, be aware that "cloning a repo" will copy all directories and all files. Even if you want only the PrintToLog module, you have to copy all files and all directories.

In SAS 9.4m6, the function to create a local copy of a remote repo is GITFN_CLONE. (This name was later deprecated in favor of GIT_CLONE, without the 'FN' characters.) The following SAS DATA step copies the files from the Git repo into a specified location on my PC in SAS 9.4m6:

/* Clone the GitHub repo into RepoPath on SAS 9.4M6 */
options dlcreatedir;  /* give permission to create the RepoPath directory, if it doesn't exist */
%let gitURL = https://github.com/sascommunities/the-do-loop-blog/;  /* Git repo to copy */
%let RepoPath = C:\Downloads\BlogRepo;                              /* location to put copy */
 
/* clone repository into RepoPath; if local repository exists, skip download */
data _null_;
if fileexist("&RepoPath.") then do;
   put 'Repository already exists; skipping the clone operation';
end;
else do;
   put "Cloning repository from &gitURL";
   /* NOTE: use GITFN_CLONE function for 9.4M6; use GIT_CLONE function for Viya */
   rc = gitfn_clone("&gitURL", "&RepoPath." ); 
end;
run;

If I navigate to the RepoPath directory on my PC, I can see that the entire repo has been copied, including the PrintToLog directory and SAS file.

I can therefore include the file into a SAS program in the usual way by using the %INCLUDE statement:

proc iml;
/* define the PrintToLog subroutine */
%include "&RepoPath/printtolog/printtolog.sas";
/* call the PrintToLog subroutine */
run PrintToLog("This is a test message.", 0);

If you run the program, it prints the following message to the log:

NOTE: This is a test message.

The GITFN_CLONE function downloads a repo to an empty directory. If the remote repo changes and you want to download the new version, you can delete the local repo and rerun the DATA _NULL_ step.

Download functions from GitHub in SAS Viya

Technically, there is no need to download the PrintToLog function in SAS Viya, since the PrintToLog function is built-in to SAS IML in SAS Viya. Nevertheless, let's examine how to clone a Git repo on SAS Viya.

Here's the main issue: If you are running a program on SAS Viya "in the cloud," the program is probably executing on a remote server. Files on your local machine (on which you are running SAS Studio in a browser) might not be accessible to the server. In addition, you might not have administrative privileges to add new files and directories on the server. So how can you download files that you can %INCLUDE into a program?

Chris Hemedinger's blog post introduced me to a wonderful idea: put the files into the WORK libref or some other libref for which you have write permission. If you use a temporary libref such as WORK, the repo will disappear at the end of the SAS session. If you want the repo to persist, clone it to a permanent libref.

Thus, I need to modify only two lines in the previous program: define the RepoPath macro to point to a writable libref, and call the GIT_CLONE function (note the different name), as follows:

/* Clone the GitHub repo into RepoPath on SAS Viya */
options dlcreatedir;  /* give permission to create the RepoPath directory if it doesn't exist */
%let gitURL = https://github.com/sascommunities/the-do-loop-blog/;  /* Git repo to copy */
%let RepoPath = %sysfunc(getoption(WORK))/BlogRepo;                 /* location to put copy */
 
/* clone repository into RepoPath; if local repository exists, skip download */
data _null_;
if fileexist("&RepoPath.") then do;
   put 'Repository already exists; skipping the clone operation';
end;
else do;
   put "Cloning repository from &gitURL";
   /* NOTE: use GITFN_CLONE for 9.4M5; use GIT_CLONE for 9.4M6 and for Viya */
   rc = git_clone("&gitURL", "&RepoPath." ); 
end;
run;
 
proc iml;
/* define the PrintToLog subroutine */
%include "&RepoPath/printtolog/printtolog.sas";
/* call the PrintToLog subroutine */
run PrintToLog("This is a test message.", 0);
quit;

The %INCLUDE statement successfully reads the printtolog.sas file, which is located in the local copy of the Git repo. You can set the SOURCE2 option (by running OPTIONS SOURCE2) if you want the log to display the code that is read by the %INCLUDE statement.

Best practices to create a SAS IML library

Typically, people want to share a library of SAS IML functions that are related to each other. For example, the functions might all perform computations in a subject area such as computational biology or financial risk management. Here are a few best practices for sharing a library of SAS IML functions with others in a GitHub repo:

  • Create the files so that they can be included in a PROC IML program. That means that each file should consist of a series of START/FINISH statements that define the modules. The file should NOT contain a PROC IML statement or a QUIT statement.
  • Use the STORE statement to store the functions at the end of the file. This enables the user to store the modules. The stored modules are faster to load because they do not need to be parsed when they are loaded into a PROC IML program.
  • If the functions are related, make the module names start with a common prefix. For example, if you distribute functions that compute quantities about polygons, you might choose to use the string "Poly_" as the first few characters of each name.
  • Include documentation and examples of using the functions. The documentation explains the input and output arguments. The examples demonstrate how to call the functions and explain the results.

For example, the following template shows how you might structure a file (named Sim_Defin.sas) that contains many functions that are related to simulations:

/* do not use the PROC IML statement */
start Sim_Normal(n, mean=0, std=1);
   ...
finish;
start Sim_LogNormal(n, mean=0, std=1);
   ...
finish;
start Sim_Exp(n, scale=1);
   ...
finish;
store module=(Sim_Normal Sim_LogNormal Sim_Exp);
/* do not use the QUIT statement */

After downloading the file, a SAS IML programmer can use the functions as follows:

proc iml;
%include "&RepoPath/Sim/Sim_Define.sas";   /* read the function definitions and store them */
x = Sim_Normal(100);
quit;

Because the file ends with a STORE statement, you can use the LOAD statement for subsequent calls:

proc iml;
load module=(Sim_Normal);   /* load individual modules or LOAD MODULE=_ALL_; */
x = Sim_Normal(100);
quit;

Summary

This article discusses how to use functions in SAS to download a Git repository of files. On SAS 9.4m6, you can call the GITFN_CLONE function in a DATA step to copy a repo from a remote site such as GitHub into a local repository. On SAS Viya in the cloud, you can use the GIT_CLONE function to copy a repo into a libref for which you have write permission. In either case, you can then use the %INCLUDE statement to read a file into a SAS program. In SAS IML, the primary application of this technique is to read a file that defines a series of related modules. The article concludes by providing some best practices for sharing a library of SAS IML modules.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

3 Comments

  1. Chris Hemedinger
    Chris Hemedinger on

    Thanks for sharing this, Rick! Readers might be interested to know that the name changes for the functions (GITFN*->GIT, such as in GIT_CLONE) are applied in SAS 9.4M7 (and of course carried into SAS 9.4M8), so most SAS users reading this should be able to use that convention.

    Also, if you don't care about the temporary path name where you clone a repo, you can have SAS generate a unique name and skip the "does it exist" step.

    options dlcreatedir;  /* give permission to create the RepoPath directory if it doesn't exist */
    %let gitURL = https://github.com/sascommunities/the-do-loop-blog/;  /* Git repo to copy */
    %let RepoPath = %sysfunc(getoption(WORK))/&sysfunc(uuidgen());        /* unique location to put copy */
    

  2. Pingback: Modifications of the Wilcoxon signed rank test and exact p-values - The DO Loop

  3. Pingback: Blog posts from 2023 that deserve a second look - The DO Loop

Leave A Reply

Back to Top