The code and data that drive analytics projects are important assets to the organizations that sponsor them. As such, there is a growing trend to manage these items in the source management systems of record. For most companies these days, that means Git. The specific system might be GitHub Enterprise, GitLab, or Bitbucket -- all platforms that are based on Git.
For a quick-start tutorial check out this 12-minute video on the SAS Users YouTube channel:
Many SAS products support direct integration with Git. This includes SAS Studio, SAS Enterprise Guide, and the SAS programming language. (That last one checks a lot of boxes for ways to use Git and SAS together.) While we have good documentation and videos to help you learn about Git and SAS, we often get questions around "best practices" -- what is the best/correct way to organize your SAS projects in Git?
In this article I'll dodge that question, but I'll still try to provide some helpful advice in the process.
Ask the Expert resource: Using SAS® With Git: Bring a DevOps Mindset to Your SAS® CodeGuidelines for managing SAS projects in Git
It’s difficult for us to prescribe exactly how to organize project repositories in source control. Your best approach will depend so much on the type of work, the company organization, and the culture of collaboration. But I can provide some guidance -- mainly things to do and things to avoid -- based on experience.
Do not create one huge repository
DO NOT build one huge repository that contains everything you currently maintain. Your work only grows over time and you'll come to regret/revisit the internal organization of a huge project. Once established, it can be tricky to change the folder structure and organization. If you later try to break a large project into smaller pieces, it can be difficult or impossible to maintain the integrity of source management benefits like file histories and differences.
Design with collaboration in mind
DO NOT organize projects based only on the teams that maintain them. And of course, don't organize projects based on individual team members.
- Good repo names:
risk-adjustment-model
,engagement-campaigns
- Bad repo names:
joes-code
,claims-dept
All teams reorganize over time, and you don't want to have to reorganize all of your code each time that happens. And code projects change hands, so keep the structure personnel-agnostic if you can. Major refactoring of code can introduce errors, and you don't want to risk that just because you got a new VP or someone changed departments.
Instead, DO organize projects based on function/work that the code accomplishes. Think modular...but don't make projects too granular (or you'll have a million projects). I personally maintain several SAS code projects. The one thing they have in common is that I'm the main contributor -- but I organize them into functional repos that theoretically (oh please oh please) someone else could step in to take over.
Up with reuse, down with ownership
This might seem a bit communist, but collaboration works best when we don't regard code that we write as "our turf." DO NOT cling to notions of code "ownership." It makes sense for teams/subject-matter experts to have primary responsibility for a project, but systems like Git are designed to help with transparency and collaboration. Be open to another team member suggesting and merging (with review and approval) a change that improves things. GitHub, GitLab, and Bitbucket all support mechanisms for issue tracking and merge requests. These allow changes to be suggested, submitted, revised, and approved in an efficient, transparent way.
DO use source control to enable code reuse. Many teams have foundational "shared code" for standard operations, coded in SAS macros or shared statements. Consider placing these into their own project that other projects and teams can import. You can even use Git functions within SAS to fetch and include this code directly from your Git repository:
/* create a temp folder to hold the shared code */ options dlcreatedir; %let repoPath = %sysfunc(getoption(WORK))/shared-code; libname repo "&repoPath."; libname repo clear; /* Fetch latest code from Git */ data _null_; rc = git_clone( "https://gitlab.mycompany.com/sas-projects/shared-code/", "&repoPath."); run; options source2; /* run the code in this session */ %include "&repoPath./bootstrap-macros.sas"; |
If you rely on a repository for shared code and components, make sure that tests are in place so changes can be validated and will not break downstream systems. You can even automate tests with continuous integration tools like Jenkins.
DO document how projects relate to each other, dependencies, and prepare guidance for new team members to get started quickly. For most of us, we feel more accountable when we know that our code will be placed in central repositories visible to our peers. It may inspire cleaner code, more complete documentation, and a robust on-boarding process for new team members. Use the Markdown files (README.md and others) in a repository to keep your documentation close to the code.
Work with Git features (and not against them)
Once your project files are in a Git repository, you might need to change your way of working so that you aren't going against the grain of Git benefits.
DO NOT work on code changes in a shared directory with multiple team members –- you'll step on each other. The advantage of Git is that it's a distributed workflow and each developer can work with their own copy of the repository, and merge/accept changes from others at their own pace.
DO use Git branching to organize and isolate changes until you are ready to merge them with the main branch. It takes a little bit of learning and practice, but when you adopt a branching approach you'll find it much easier to manage -- it beats keeping multiple copies of your code with slightly different file and folder names to mark "works in progress."
DO consider learning and using Git tools such as Git Bash (command line), Git GUI, and a code IDE like VS Code. These don't replace the SAS-provided coding tools with their Git integration, but they can supplement your workflow and make it easier to manage content among several projects.
Learning more
When you're ready to learn more about working with Git and SAS, we have many webinars, videos, and documentation resources:
- Using SAS with Git: Bring a DevOps Mindset to Your SAS Code (Ask the Expert webinar)
- How do you manage your SAS projects with Git? (Ask the Expert webinar)
- Git functions in SAS 9.4 and SAS Viya (doc)
- Using Git in SAS Enterprise Guide (doc)
- Git with SAS Studio and SAS Enterprise Guide (video)
- Using built-in Git operations in SAS (blog)
- DevOps with SAS 9: SAS code, GitLab, and Jenkins (community)
- developer.sas.com for SAS app development
- Pro Git by Scott Chacon and Ben Straub, free online book about Git
- SAS Note about generating SSH key for use with Git and SAS
4 Comments
How do you do branching? We are having trouble getting branching to work using SAS DI, GitHub and Jenkins.....
I assume branching has to be handled outside of DI Studio in the local repository. I don't think DI Studio provides a view into active branch.
Hello, i'm having trouble with commiting .egp projects because it won't show what was changed, only that it has been changed, of course that is not the case for .sas files but i don't want to save each program as a separate .sas file since they all depend on each other, so how do i make git read each program inside the project so it will show exactly the altered lines that i've commited?
Hi William, I'm afraid that while you can manage EGP files in Git, you won't have visibility into the code differences. EGP files are a binary file format that cannot be "diffed" in a meaningful way. Best practice is to store your .SAS files in a folder alongside or in a subfolder with your EGP, and use the Project Properties "use relative file references" option to allow SAS Enterprise Guide to always resolve these file locations when you manage the collection of files in different root folders.