It seems that everyone knows about GitHub -- the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) -- but now we all use it for all types of projects.
There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.
Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work -- including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:
- the Base SAS programming language, via a collection of SAS functions.
- SAS Data Integration Studio, via a new source control plugin
- SAS Studio (now production in v3.8 with its first hotfix)
You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.
Watch related webinar: Using SAS® With Git: Bring a DevOps Mindset to Your SAS® CodeSAS functions for Git
Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of "GITFN_" (signifying "Git fun!", I assume). Here's a partial list:
GITFN_CLONE | Clones a Git repository (for example, from GitHub) into a directory on the SAS server. |
GITFN_COMMIT | Commits staged files to the local repository |
GITFN_DIFF | Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository. |
GITFN_PUSH | Pushes the committed files in the local repository to the remote repository. |
GITFN_NEW_BRANCH | Creates a Git branch |
The function names make sense if you're familiar with Git lingo. If you're new to Git, you'll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)
You can learn about the SAS functions from the SAS documentation -- including important details about how to connect SAS to Git.
Here's an example program that clones (that is, copies into a local space) a repository that contains code samples from my blog:
data _null_; version = gitfn_version(); put version=; rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/", "c:\Projects\sas-dummy-blog"); put rc=; run; |
In one line, this function fetches an entire collection of code files from your source control system. Here's a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try -- here's the code that will be pulled/run. It even works from SAS University Edition.)
options dlcreatedir; %let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog; libname repo "&repoPath."; libname repo clear; /* Fetch latest code from GitHub */ data _null_; rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/", "&repoPath."); put rc=; run; /* run the code in this session */ %include "&repoPath./rng_example_thanos.sas"; |
You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results -- whatever you need to keep and version.
Using Git in SAS Data Integration Studio
SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren't media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.
See the documentation for details: How to use the Git plug-in for SAS Data Integration Studio. Or, see this very detailed SAS communities article, with a tutorial video included!
Using Git in SAS Studio
Beginning with SAS Studio 3.8, you can manage your SAS programs in a Git repository. This integration requires a bit of set up to allow SAS Studio to connect to your repository "as you" using the standard mechanism of SSH public/private keys. Once configured, you can add repositories to your SAS Studio session, fetch the latest versions of files, stage new files, commit files, and see history. You'll see the Git content set apart with a special icon, indicating that it's managed in Git.
Read more about setup and use in the SAS Studio documentation
Add SAS Studio custom tasks from Git
Did you know that you can add custom tasks to SAS Studio? And that you can share these tasks in a central location using Git? This feature has been available for several releases. You can configure this in the Task Repositories pane of the Preferences window.
You can try this with a collection of SAS-supplied custom tasks, available here as part of our "Custom Tasks Tuesday" series.
Using Git in SAS Enterprise Guide
This isn't new, but I'll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.
If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here's an important tip. Change your project file properties to "Use paths relative to the project for programs and importable files." You'll find this checkbox in File->Project Properties.
With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.
SAS Enterprise Guide v8.2 includes even more Git integration, with support for cloning repositories, pull, push, and managing branches.
21 Comments
This is great! Thanks for the update, Chris.
Sounds like a follow-up to my Code Like It Matters paper might be necessary...
Paul, Go for it!
On the topic of GIT commits - for large projects, there is value in standardising your commit messages! Here's a great approach: https://www.conventionalcommits.org/en/v1.0.0-beta.2/
There's even an NPM library that will package these kind of commits and create a changelog / release notes for you, as well as semantic versioning: https://github.com/conventional-changelog/standard-version
Thanks for this great post! I tried running the cloning code you supplied but got this error:
ERROR: Unable to load libgit2 module.
I googled that error to no avail; have you ever seen it?
I'm checking into that -- looks like SAS can't find the Git library that it needs to interface with Git. You didn't mention your OS and SAS version. Assuming SAS 9.4m6 -- but Windows or Linux or what? And using Base SAS, EG, or SAS Studio?
Oops, sorry for omitting that info. Looks like 9.4 m5.
Current version: 9.04.01M5P091317
Operating System: LIN X64 .
And I'm running SAS through EG
Jed,
If you're running on Linux, there was an issue with loading the git libraries that was addressed with a 9.4m6 hotfix.
The other possibility is that your version of SAS is 9.4m5 which has the first iteration of the functions but the git libraries were not being shipped with this version of SAS. The functions in 9.4m5 are not considered production and were not documented for this reason.
Hi Chris,
Very interesting post. I have a question about SAS Enterprise Guide.
I have a project with 3 process flow and I have and ordered list to extecute them: process flow1 -> process flow2 - process flow3
Is there any option to stop the execution if some of the process flow gives and error?
I would like to execution if process flow1 gives errors and do not execute process flow2 and process flow3
Thanks in advance
The only way that I can think of would be to add a new node, probably a Program node, to the start of each flow. Then add a Condition to that node to check for an error or perhaps a macro variable flag that you define.
Chris and SAS,
Is there a way for me to load the GIT functions and try them out even before my very large organization upgrades us to 4M6?
Saw you in person in Dallas- thanks for all your good work.
You could try them in SAS University Edition -- free to download and install for noncommercial purposes. Use that as a POC and to help justify the upgrade!
Pingback: Gifts to give the SAS fan in your life - SAS Users
Hi.
Git integration in in EG 8 looks very promising!
My organization only permitts access to our own git installation through ssh. That prevents me to use the built-in git integration in EG (http(s) only).
Are there plans to let EG users connect through ssh anytime soon?
Best regards
Fredrik, with the latest changes in SAS Enterprise Guide 8.2, you can't Clone a repository with ssh, only http. But if you clone using another tool, you can use the Git Repositories->Add feature to point EG to your local Git repo and work with it that way.
Yes. Thats very nice! If only I could push/pull (to ssh) it would be perfect! :-)
And another thing. If I could see the whole git-working tree (not only changed files), I could browse and open all my programs from one place.
Push/pull is now supported in the latest EG and SAS Studio clients. EG supports cloning with HTTPS only, and SAS Studio supports SSH and HTTPS. But you can connect EG to a local repository you've already cloned another way.
SAS Studio lets you add the repo as a Folder Shortcut and then you can navigate your content. EG uses the standard File->Open approach to getting to your content.
Dear Chris,
thank you for this post! I'd have a question. With CI MA 6.5 (used with DI for data loading and EG for custom code management) a near future upgrade from 9.4m4 to m6 or m7 is planned along with the possible introduction of a version control system for the custom code base. Would you recommend introducing git before or after/during the maintenance level migration?
Thank you much,
Gabor
Git (and the practice of using version control) can take some time to get used to, so I recommend introducing it sooner...and then when the new tools arrive, you'll be able to accelerate your time to adopt them.
Thank you much for the quick reply, that's reasonable.
Pingback: How to organize your SAS projects in Git - The SAS Dummy