It seems that everyone knows about GitHub -- the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) -- but now we all use it for all types of projects.
There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.
Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work -- including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:
- the Base SAS programming language, via a collection of SAS functions.
- SAS Data Integration Studio, via a new source control plugin
- SAS Studio (v3.8 for SAS 9.4, and SAS Viya 3.5 and later)
I've recorded a tutorial (12 minutes or so) that you can watch to learn how to get started quickly!
You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.Watch related webinar: Using SAS® With Git: Bring a DevOps Mindset to Your SAS® Code
SAS functions for Git
Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of "GITFN_" (signifying "Git fun!", I assume). Here's a partial list:
|GITFN_CLONE||Clones a Git repository (for example, from GitHub) into a directory on the SAS server.|
|GITFN_COMMIT||Commits staged files to the local repository|
|GITFN_DIFF||Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository.|
|GITFN_PUSH||Pushes the committed files in the local repository to the remote repository.|
|GITFN_NEW_BRANCH||Creates a Git branch|
The function names make sense if you're familiar with Git lingo. If you're new to Git, you'll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)
You can learn about the SAS functions from the SAS documentation -- including important details about how to connect SAS to Git.
Here's an example program that clones (that is, copies into a local space) a repository that contains code samples from my blog:
data _null_; version = gitfn_version(); put version=; rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/", "c:\Projects\sas-dummy-blog"); put rc=; run;
In one line, this function fetches an entire collection of code files from your source control system. Here's a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try -- here's the code that will be pulled/run. It even works from SAS University Edition.)
options dlcreatedir; %let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog; libname repo "&repoPath."; libname repo clear; /* Fetch latest code from GitHub */ data _null_; rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/", "&repoPath."); put rc=; run; /* run the code in this session */ %include "&repoPath./rng_example_thanos.sas";
You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results -- whatever you need to keep and version.
Using Git in SAS Data Integration Studio
SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren't media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.
See the documentation for details: How to use the Git plug-in for SAS Data Integration Studio. Or, see this very detailed SAS communities article, with a tutorial video included!
Using Git in SAS Studio
Beginning with SAS Studio 3.8, you can manage your SAS programs in a Git repository. This integration requires a bit of set up to allow SAS Studio to connect to your repository "as you" using the standard mechanism of SSH public/private keys. Once configured, you can add repositories to your SAS Studio session, fetch the latest versions of files, stage new files, commit files, and see history. You'll see the Git content set apart with a special icon, indicating that it's managed in Git.
Read more about setup and use in the SAS Studio documentation
Add SAS Studio custom tasks from Git
Did you know that you can add custom tasks to SAS Studio? And that you can share these tasks in a central location using Git? This feature has been available for several releases. You can configure this in the Task Repositories pane of the Preferences window.
You can try this with a collection of SAS-supplied custom tasks, available here as part of our "Custom Tasks Tuesday" series.
Using Git in SAS Enterprise Guide
This isn't new, but I'll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.
If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here's an important tip. Change your project file properties to "Use paths relative to the project for programs and importable files." You'll find this checkbox in File->Project Properties.
With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.
SAS Enterprise Guide v8.2 includes even more Git integration, with support for cloning repositories, pull, push, and managing branches.
This is great! Thanks for the update, Chris.
Sounds like a follow-up to my Code Like It Matters paper might be necessary...
Paul, Go for it!
On the topic of GIT commits - for large projects, there is value in standardising your commit messages! Here's a great approach: https://www.conventionalcommits.org/en/v1.0.0-beta.2/
There's even an NPM library that will package these kind of commits and create a changelog / release notes for you, as well as semantic versioning: https://github.com/conventional-changelog/standard-version
Thanks for this great post! I tried running the cloning code you supplied but got this error:
ERROR: Unable to load libgit2 module.
I googled that error to no avail; have you ever seen it?
I'm checking into that -- looks like SAS can't find the Git library that it needs to interface with Git. You didn't mention your OS and SAS version. Assuming SAS 9.4m6 -- but Windows or Linux or what? And using Base SAS, EG, or SAS Studio?
Oops, sorry for omitting that info. Looks like 9.4 m5.
Current version: 9.04.01M5P091317
Operating System: LIN X64 .
And I'm running SAS through EG
If you're running on Linux, there was an issue with loading the git libraries that was addressed with a 9.4m6 hotfix.
The other possibility is that your version of SAS is 9.4m5 which has the first iteration of the functions but the git libraries were not being shipped with this version of SAS. The functions in 9.4m5 are not considered production and were not documented for this reason.
Very interesting post. I have a question about SAS Enterprise Guide.
I have a project with 3 process flow and I have and ordered list to extecute them: process flow1 -> process flow2 - process flow3
Is there any option to stop the execution if some of the process flow gives and error?
I would like to execution if process flow1 gives errors and do not execute process flow2 and process flow3
Thanks in advance
The only way that I can think of would be to add a new node, probably a Program node, to the start of each flow. Then add a Condition to that node to check for an error or perhaps a macro variable flag that you define.
Chris and SAS,
Is there a way for me to load the GIT functions and try them out even before my very large organization upgrades us to 4M6?
Saw you in person in Dallas- thanks for all your good work.
You could try them in SAS University Edition -- free to download and install for noncommercial purposes. Use that as a POC and to help justify the upgrade!
Pingback: Gifts to give the SAS fan in your life - SAS Users
Git integration in in EG 8 looks very promising!
My organization only permitts access to our own git installation through ssh. That prevents me to use the built-in git integration in EG (http(s) only).
Are there plans to let EG users connect through ssh anytime soon?
Fredrik, with the latest changes in SAS Enterprise Guide 8.2, you can't Clone a repository with ssh, only http. But if you clone using another tool, you can use the Git Repositories->Add feature to point EG to your local Git repo and work with it that way.
Yes. Thats very nice! If only I could push/pull (to ssh) it would be perfect! :-)
And another thing. If I could see the whole git-working tree (not only changed files), I could browse and open all my programs from one place.
Push/pull is now supported in the latest EG and SAS Studio clients. EG supports cloning with HTTPS only, and SAS Studio supports SSH and HTTPS. But you can connect EG to a local repository you've already cloned another way.
SAS Studio lets you add the repo as a Folder Shortcut and then you can navigate your content. EG uses the standard File->Open approach to getting to your content.
thank you for this post! I'd have a question. With CI MA 6.5 (used with DI for data loading and EG for custom code management) a near future upgrade from 9.4m4 to m6 or m7 is planned along with the possible introduction of a version control system for the custom code base. Would you recommend introducing git before or after/during the maintenance level migration?
Thank you much,
Git (and the practice of using version control) can take some time to get used to, so I recommend introducing it sooner...and then when the new tools arrive, you'll be able to accelerate your time to adopt them.
Thank you much for the quick reply, that's reasonable.
Pingback: How to organize your SAS projects in Git - The SAS Dummy
We are using SAS Enterprise Guide (8.3(8.3.0/103) . Since our company only allows the projects (codes) stored in our SAS server (not github or gitlab). What need to be done on our SAS server to have a repositories and do version control internally? Thanks
You can use Program History in EG but it requires that your programs are embedded in EG projects, not external files. You can use Git without a Git server (like Gitlab or GitHub) by managing your code in a local repository. Allows for program history, versions -- but does not support collaboration or backup features. To use Git features from EG, that local repository needs to be managed from the machine/network where EG is installed. If your code is always on the SAS server file system, then you cannot use Git directly from EG. However, SAS Studio (which runs in a central server) can use Git with the server file system.
I have following the instruction but when trying to initialise the repository I get an 'Failed to initialize local repository'. I can initialize, push, pull from all other applications to this repository but not from SAS Data Integration Studio. any ideas?
Richard, you should probably contact SAS Technical Support for help on this and to track it.
Hi Chris, thanks for the tutorial. I'm working in SAS Data Integration Studio and I completed with succes the configuration for my GitHub, but when I try to send something to GitHub clicking on "Archive as SAS Package", that package is not sent to the Git repository, instead it result exported in a local directory. So the Git configuration seems to be ok but nothing can be sent to te repository. Have you any idea about a possible resolution?
Git works by allowing you to work with a repository in a local directory, and then use Git tools to commit/push the artifacts to your Git server. So this may be working as expected, but I suggest working with SAS Technical Support to ensure that you're following the correct steps and seeing the expected result.
Hi, thanks to reply
I created a Repository directory but the location where the exported package is stored is not that one. Of course the path of the repository directory was inserted in the setting panel of the GitPlugin, and the initialization of the repository was succesful, but it seems to make no differences. When I try to Archive a package, SAS asks me to insert a Name and a Description, and that seems to be a confirm that the GitHub is ready to recieve my versioning, but unfurtunately the package is not sent where i want and no sinchronization between GitHub and my directory is done.
I see how that's confusing. Definitely work with SAS Tech Support to resolve this!
Hi Chris, good evening.
I have a question about the storing of the versions in Git, using SAS Data Integration Studio. As you know, the way to delete permanently a version is to use the "Archivied SAS Package" window in order to eliminate the object you want to delete. Using this approach, the file named "Archivies.xml" results updated and inside it any reference to the deleted version will disappear. Instead, If I delete a version manually from the RepositoryGit folder or in my GIT centralized URL, i still see this version inside the "Archivied SAS Package" chronology, and I can eventually import back that version. So my doubt is: where are these versions stored?? Same question about version merging: If I have a version A (the old one) and a version B (the new)., where is the old version located? I can't find it phisically, but i can use it for a merge, so it must exist somewhere!
Velerio, good questions. I don't know a lot about how SAS DI Studio works in this case. I suggest opening a question with SAS Technical Support to get the best answer.
thank you for this article. I was wondering if more SAS programms are integrated now, as of MArch 2022 with GIT, for example Visual Analytics? could you make an update on that? regards
SAS Visual Analytics is not integrated, but Model Manager is. More applications have Git integration on the roadmap where it makes sense.
Hi Chris, always enjoy reading your blog! Is there a recommended approach to using the Enhanced Editor with GitHub? Maybe just keeping all the code local but have an automatic sync up to GitHub nightly?
Yes, if using Base SAS on Windows, I think that's the way to go. SAS Display Manager won't have any built-in Git awareness. You could also consider using VS Code and the new SAS extension to manage your files (even if you use SAS on Windows to run them).
Hi Chris, thank you for the helpful blog. I am trying to use Git within SAS Studio (Viya4). After connecting my github account I can successfully clone into my remote repo. When I change something in Viya and try to push my changes into the remote repo I get a remote 'origin' already exists error. How can I fix thisin the Studio interface given there is no CMD in which I can do some things I used to do to fix these types or errors when working from my local computer?
Hi Shima, this was a problem with an earlier version of SAS Viya but was fixed in 2022.1.1. Do you have a later version than that or maybe this is something different?