Using source control management with SAS Enterprise Guide

I work on a variety of projects at SAS, most of which require some level of team collaboration in source management systems. Due to the many technologies that we work with, SAS developers use different source management tools for different purposes. I've got projects in CVS, Subversion, and Git.

When it comes to writing and maintaining SAS programs (are you ready for this shocker?), I use SAS Enterprise Guide almost exclusively. I know how to use all of the other tools, but this is where I'm most productive.

So, here's the question that I hear all of the time: Does SAS Enterprise Guide integrate with a source control system? The answer is No. But, it does integrate with the Windows file system. And the Windows file system integrates with source control.

Frankly, this approach is how I prefer to work anyway. Probably due to bad past experiences with source control integration in other IDEs, I prefer to deal with source control on the file system. When my content is ready to commit, I use the file system integration (usually in the form of Windows shell extensions) to review the commit, check differences and history, manage file merges (if needed), and actually commit the files.

For example, here is a picture of one of my project directories where we're using Subversion. I'm using TortoiseSVN to access the repository. You can see the icon overlays that indicate the files that are up-to-date (green checks) and those that are modified locally (red exclamation).

How to enable relative file references

SAS Enterprise Guide allows you to link in SAS programs and external data files (such as Excel or CSV files), so you don't have to lock up all of your content in the project (EGP) file. When working with source control, you need to enable one additional trick: tell SAS Enterprise Guide to treat these file references as relative paths. (There's nothing like an absolute file path -- specific to your machine -- for messing up your collaboration effort.)

This setting is maintained per project. To set it:

  1. Select File->Project Properties. The Properties window appears.
  2. Select the File References tab.
  3. Check the box: "Use paths relative to the project for programs and importable files"

An example on github

To prove that this actually works, I "refactored" one of my own projects to share with you. It's the project that analyzes my Netflix rental/streaming history. I placed the Excel files in a "./data" subfolder, and all of the SAS programs in a "./programs" subfolder. These assets, combined with the process flow "recipes" within the project, should allow anybody to re-run my project and to replicate my results. (Note: these projects do assume a Local SAS session. If you don't have a Local SAS, you must use Tools->Project Maintenance to replace the server references for your environment.)

The projects (one for v4.3, one for v5.1) are in this github repository. Git is fast becoming the most popular source control repository for all types of developers, so it doesn't get much more modern than this. If you're brave and you have some time to kill, give it a try. And please, don't be too critical of my movie rental history.

tags: CVS, github, SAS Enterprise Guide, scm, source control, Subversion

11 Comments

  1. Posted October 29, 2012 at 6:49 pm | Permalink

    As a SVN and CVS user outside of work for non-SAS projects, I've hoped our SAS coders would adopt some kind of SCM. We mostly use Base SAS and some EG.

    This post is a good proof of concept, but I think our team is a long way off from SCM for a few reasons including:

    * Code, input data, and output are not well organized in folders (not even on the same drive).
    * There is little distinction between phases of development.
    * Thinking in terms of atomic commits (a best practice, I think, for SCM) can be hard.

    • Chris Hemedinger Chris Hemedinger
      Posted October 29, 2012 at 7:52 pm | Permalink

      Andrew,

      Thanks for the comment. I also find "source control culture" is challenging for those team members who are accustomed to "all-SAS" projects, where the entire deliverable is a collection of SAS programs. But so many of our projects these days are a true mix of tech - some SAS combined with other technology to "glue together" a process.

      It's a balancing act. We want our SCM process to support our SAS work without dictating how we organize it. SCM is an essential part of team collaboration (not to mention Agile and continuous integration), and it requires that all team members "get on the same page" for how to use it.

  2. Dave Garbutt
    Posted October 30, 2012 at 1:15 pm | Permalink

    I was involved in implenting Source control for SAS using ClearCase and I use SlickEdit for my IDE.

    It integrates with CC and allows you to check out/in - see history etc. We also VC sasdatasets on this system. CC is great because it emulates a file system so you can put anything on top and add versioning very easily.

    The best SAS IDE I have ever seen is Macumba from Bayer. There was a talk on it at PhUSE two weeks ago. It includes a dataset debugger and they are working on a macro debugger!

    http://www.lexjansen.com/cgi-bin/xsl_transform.php?x=phuse2012&s=proceedings&c=phuse#phuse2012.ad01

  3. Denis Richardson
    Posted October 31, 2012 at 3:06 pm | Permalink

    What about a plugin or add-on to integrate with Subversion? Anything out there, or should we create one??

    • Chris Hemedinger Chris Hemedinger
      Posted October 31, 2012 at 3:40 pm | Permalink

      Denis,

      A plug-in for something like this is possible to make, although EG doesn't have the hooks to provide the integration you might be accustomed to. I built an example "program manager" custom task a couple of years ago. I built it for a customer who was interested in extending it to integrate with their own SCM.

      You can find the Program Manager example at go.sas.com/customtasksapi.

  4. Bob
    Posted November 6, 2012 at 7:02 am | Permalink

    Hi Chris

    We have a policy of date stamping all our datasets in the naming convention - eg SALES_YYYYMM. As we have a number of projects that get regularly reused each month, we either have the option of either editing the name of the input dataset (when feeding into a query), or taking the code behind the query and pasting it into a code node and paramaterising it with prompt values. The latter is great for speed, but terrible for support and maintenance if you are not a coder and need to change joins or add new tables. Any suggestions for either reverse engineering back into the query wizard, or are there any plans to support variables in the input dataset names in the query wizard? I suppose the simplest solution is to keep a copy of the project with the queries built using the wizard!

    • Chris Hemedinger Chris Hemedinger
      Posted November 6, 2012 at 9:57 am | Permalink

      Bob, have you seen the Query Template feature in SAS Enterprise Guide 5.1? That might serve your purpose. Design the query once, and then reuse with different data as needed. Here is a link to a comprehensive paper from SAS Global Forum 2012, "Finding Your Inner Query".

      • Bob
        Posted November 7, 2012 at 6:22 am | Permalink

        Actually I did, but didnt understand its application - will follow up on your link..

        • Bob
          Posted November 8, 2012 at 8:03 am | Permalink

          Ugh! I can see its uses, but sadly not in this case - for us it would be easier to edit the properties of the input datasets than to use templates - I guess its one of these things thats a trade-off..

  5. David Rice
    Posted May 16, 2013 at 4:07 pm | Permalink

    Is there a way - without removing and re-adding programs and importable files - to convert absolute file paths to relative paths? We were not previously using a formal version control system but are in the process of implementing AccuRev. External files added to project files prior to checking the "Use paths relative to the project ..." checkbox retain the absolute file paths.

    Thanks!
    David

    • Chris Hemedinger Chris Hemedinger
      Posted May 17, 2013 at 9:30 am | Permalink

      David,

      As long as the referenced files are in the same folder as the EGP file, or in a relative folder below the level of the EGP location, the file references should resolve without additional work. If the referenced files are not in the same directory root as the EGP folder, then you will have some fixup work to do.

2 Trackbacks

  1. [...] Using source control management with SAS Enterprise Guide [...]

  2. […] Why Git? Functionally, it fits the purpose. And the SAS team was able to embed the necessary pieces within the application, so you don't need to install additional tools before getting started. And besides, all of the cool kids use Git these days. If you need to work with Subversion or another tool, you can still use this file-system technique. […]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <p> <pre lang="" line="" escaped=""> <q cite=""> <strike> <strong>