At SAS, we've published more repositories on GitHub as a way to share our open source projects and examples. These "repos" (that's Git lingo) are created and maintained by experts in R&D, professional services (consulting), and SAS training. Some recent examples include:
- sas_kernel, which provides Jupyter notebook support for SAS.
- sas-prog-for-r-users, the free course from SAS training to help experienced R coders to learn SAS.
- dm-flow, a series of ready-to-use examples for SAS Enterprise Miner users
With dozens of repositories under the sassoftware account, it becomes a challenge to keep track of them all. So, I've built a process that uses SAS and the GitHub APIs to create reports for my colleagues.
Using the GitHub API
GitHub APIs are robust and well-documented. Like most APIs these days, you access them using HTTP and REST. Most of the API output is returned as JSON. With PROC HTTP and the JSON libname engine (new in SAS 9.4 Maint 4), using these APIs from SAS is a cinch.
The two API calls that we'll use for this basic report are:
- api.github.com/orgs/<organization>, which returns some metadata about the organization's account. Click here to see sample output for SAS Software's account.
- api.github.com/orgs/<organization>/repos, which returns a list of repositories underneath that account. Again, here's the first page of output for SAS Software.
Fetching the GitHub account metadata
The following SAS program calls the first API to gather some account metadata. Then, it stores a selection of those values in macro variables for later use.
/* Establish temp file for HTTP response */ filename resp temp; /* Get Org metadata, including repo count */ proc http url="https://api.github.com/orgs/sassoftware" method="GET" out=resp ; run; /* Read response as JSON data, extract select fields */ /* It's in the ROOT data set, found via experiment */ libname ss json fileref=resp; data meta; set ss.root; call symputx('repocount',public_repos); call symputx('acctname',name); call symputx('accturl',html_url); run; /* log results */ %put &=repocount; %put &=acctname; %put &=accturl; |
Here is the output of this program (as of today):
REPOCOUNT=66 ACCTNAME=SAS Software ACCTURL=https://github.com/sassoftware
The important piece of this output is the count of repositories. We'll need that number in order to complete the next step.
Fetching the repositories and stats
It turns out that the /repos API call returns the details for 30 repositories at a time. For accounts with more than 30 repos, we need to call the API multiple times with a &page= index value to iterate through each batch. I've wrapped this process in a short macro function that repeats the calls as many times as needed to gather all of the data. This snippet calculates the upper bound of my loop index:
/* Number of repos / 30, rounded up to next integer */ %let pages=%sysfunc(ceil(%sysevalf(&repocount / 30))); |
Given the 66 repositories on the SAS Software account right now, that results in 3 API calls.
Each API call creates verbose JSON output with dozens of fields, only a few if which we care about for this report. To simplify things, I've created a JSON map that defines just the fields that I want to capture. I came up with this map by first allowing the JSON libname engine to "autocreate" a map file with the full response. I edited that file and whittled the result to just 12 fields. (Read my previous blog post about the JSON engine to learn more about JSON maps.)
The multiple API calls create multiple data sets, which I must then concatenate into a single output data set for reporting. Then to clean up, I used PROC DATASETS to delete the intermediate data sets.
First, here's the output data:
Here's the code segment, which is rather long because I included the JSON map inline.
/* This trimmed JSON map defines just the fields we want */ /* Created by using AUTOMAP=CREATE on JSON libname */ /* then editing the generated map file to reduce to */ /* minimum number of fields of interest */ filename repomap temp; data _null_; infile datalines; file repomap; input; put _infile_; datalines; { "DATASETS": [ { "DSNAME": "root", "TABLEPATH": "/root", "VARIABLES": [ { "NAME": "id", "TYPE": "NUMERIC", "PATH": "/root/id" }, { "NAME": "name", "TYPE": "CHARACTER", "PATH": "/root/name", "CURRENT_LENGTH": 50, "LENGTH": 50 }, { "NAME": "html_url", "TYPE": "CHARACTER", "PATH": "/root/html_url", "CURRENT_LENGTH": 100, "LENGTH": 100 }, { "NAME": "language", "TYPE": "CHARACTER", "PATH": "/root/language", "CURRENT_LENGTH": 20, "LENGTH": 20 }, { "NAME": "description", "TYPE": "CHARACTER", "PATH": "/root/description", "CURRENT_LENGTH": 300, "LENGTH": 500 }, { "NAME": "created_at", "TYPE": "NUMERIC", "INFORMAT": [ "IS8601DT", 19, 0 ], "FORMAT": ["DATETIME", 20], "PATH": "/root/created_at", "CURRENT_LENGTH": 20 }, { "NAME": "updated_at", "TYPE": "NUMERIC", "INFORMAT": [ "IS8601DT", 19, 0 ], "FORMAT": ["DATETIME", 20], "PATH": "/root/updated_at", "CURRENT_LENGTH": 20 }, { "NAME": "pushed_at", "TYPE": "NUMERIC", "INFORMAT": [ "IS8601DT", 19, 0 ], "FORMAT": ["DATETIME", 20], "PATH": "/root/pushed_at", "CURRENT_LENGTH": 20 }, { "NAME": "size", "TYPE": "NUMERIC", "PATH": "/root/size" }, { "NAME": "stars", "TYPE": "NUMERIC", "PATH": "/root/stargazers_count" }, { "NAME": "forks", "TYPE": "NUMERIC", "PATH": "/root/forks" }, { "NAME": "open_issues", "TYPE": "NUMERIC", "PATH": "/root/open_issues" } ] } ] } ; run; /* GETREPOS: iterate through each "page" of repositories */ /* and collect the GitHub data */ /* Output: <account>_REPOS, a data set with all basic data */ /* about an account's public repositories */ %macro getrepos; %do i = 1 %to &pages; proc http url="https://api.github.com/orgs/sassoftware/repos?page=&i." method="GET" out=resp ; run; /* Use JSON engine with defined map to capture data */ libname repos json map=repomap fileref=resp; data _repos&i.; set repos.root; run; %end; /* Concatenate all pages of data */ data sassoftware_allrepos; set _repos:; run; /* delete intermediate repository data */ proc datasets nolist nodetails; delete _repos:; quit; %mend; /* Run the macro */ %getrepos; |
Creating a simple report
Finally, I want to create simple report listing of all of the repositories and their top-level stats. I'm using PROC SQL without a CREATE TABLE statement, which will create a simple ODS listing report for me. I use this approach instead of PROC PRINT because I transformed a couple of the columns in the same step. For example, I created a new variable with a fully formed HTML link, which ODS HTML will render as an active link in the browser. Here's a snapshot of the output, followed by the code.
/* Best with ODS HTML output */ title "github.com/sassoftware (&acctname.): Repositories and stats"; title2 "ALL &repocount. repos, Data pulled with GitHub API as of &SYSDATE."; title3 height=1 link="&accturl." "See &acctname. on GitHub"; proc sql; select catt('<a href="',t1.html_url,'">',t1.name,"</a>") as Repository, case when length(t1.description)>50 then cat(substr(t1.description,1,49),'...') else t1.description end as Description, t1.language as Language, t1.created_at format=dtdate9. as Created, t1.pushed_at format=dtdate9. as Last_Update, t1.stars as Stars, t1.forks as Forks, t1.open_issues as Open_Issues from sassoftware_allrepos t1 order by t1.pushed_at desc; quit; |
Get the entire example
Not wanting to get too meta on you here, but I've placed the entire program on my own GitHub account. The program I've shared has a few modifications that make it easier to adapt for any organization or user on GitHub. As you play with this, keep in mind that the GitHub API is "rate limited" -- they allow only so many API calls from a single IP address in a certain period of time. That's to ensure that the APIs perform well for all users. You can use authenticated API calls to increase the rate-limit threshold for yourself, and I do that for my own production reporting process. But...that's a blog post for a different day.
6 Comments
I am eager to try these Examples but cannot until SAS OnDemand for Academics is upgraded to SAS 9.4M4. I heard from the Support that this will only happen this Summer. I guess need to wait till then.
I have not use Github but feel I should learn. I noticed in SAS EG you can enter ones github information, where would you suggest a long time SAS programmer and sometimes R user to go to get "up to speed" on how to work with SAS and Github, as a statistician.
GitHub (the site) and Git (the version control system behind GitHub) can be great tools for personal productivity and for collaboration. For personal: it's a good mechanism to keep different versions of your work, maintain a history so that you can review changes and revert "mistakes" as needed. And for collaboration: find useful projects on GitHub, adapt them for your use or use them as-is, contribute as you can. For some demos of what you can do in EG, see this blog post. For examples of integration with SAS Studio and custom tasks, check out the Custom Task repository and examples.
Pingback: How to secure your REST API credentials in SAS programs - The SAS Dummy
Pingback: How to publish to a Microsoft Teams channel using SAS - The SAS Dummy
Pingback: Using SAS with Microsoft OneDrive and SharePoint Online - The SAS Dummy