Reporting on GitHub accounts with SAS

At SAS, we've published more repositories on GitHub as a way to share our open source projects and examples. These "repos" (that's Git lingo) are created and maintained by experts in R&D, professional services (consulting), and SAS training. Some recent examples include:

With dozens of repositories under the sassoftware account, it becomes a challenge to keep track of them all. So, I've built a process that uses SAS and the GitHub APIs to create reports for my colleagues.

Using the GitHub API

GitHub APIs are robust and well-documented. Like most APIs these days, you access them using HTTP and REST. Most of the API output is returned as JSON. With PROC HTTP and the JSON libname engine (new in SAS 9.4 Maint 4), using these APIs from SAS is a cinch.

The two API calls that we'll use for this basic report are:

Fetching the GitHub account metadata

The following SAS program calls the first API to gather some account metadata. Then, it stores a selection of those values in macro variables for later use.

/* Establish temp file for HTTP response */
filename resp temp;
 
/* Get Org metadata, including repo count */
proc http
 url="https://api.github.com/orgs/sassoftware"  
 method="GET"
 out=resp
;
run;
 
/* Read response as JSON data, extract select fields */
/* It's in the ROOT data set, found via experiment   */
libname ss json fileref=resp;
 
data meta; 
  set ss.root; 
  call symputx('repocount',public_repos);
  call symputx('acctname',name);
  call symputx('accturl',html_url);
run;
 
/* log results */
%put &=repocount;
%put &=acctname;
%put &=accturl;

Here is the output of this program (as of today):

REPOCOUNT=66
ACCTNAME=SAS Software
ACCTURL=https://github.com/sassoftware

The important piece of this output is the count of repositories. We'll need that number in order to complete the next step.

Fetching the repositories and stats

It turns out that the /repos API call returns the details for 30 repositories at a time. For accounts with more than 30 repos, we need to call the API multiple times with a &page= index value to iterate through each batch. I've wrapped this process in a short macro function that repeats the calls as many times as needed to gather all of the data. This snippet calculates the upper bound of my loop index:

/* Number of repos / 30, rounded up to next integer     */
%let pages=%sysfunc(ceil(%sysevalf(&repocount / 30)));

Given the 66 repositories on the SAS Software account right now, that results in 3 API calls.

Each API call creates verbose JSON output with dozens of fields, only a few if which we care about for this report. To simplify things, I've created a JSON map that defines just the fields that I want to capture. I came up with this map by first allowing the JSON libname engine to "autocreate" a map file with the full response. I edited that file and whittled the result to just 12 fields. (Read my previous blog post about the JSON engine to learn more about JSON maps.)

The multiple API calls create multiple data sets, which I must then concatenate into a single output data set for reporting. Then to clean up, I used PROC DATASETS to delete the intermediate data sets.

First, here's the output data:

ssgit
Here's the code segment, which is rather long because I included the JSON map inline.

/* This trimmed JSON map defines just the fields we want */
/* Created by using AUTOMAP=CREATE on JSON libname       */
/* then editing the generated map file to reduce to      */
/* minimum number of fields of interest                  */
filename repomap temp;
data _null_;
 infile datalines;
 file repomap;
 input;
 put _infile_;
 datalines;
{
  "DATASETS": [
 {
   "DSNAME": "root",
   "TABLEPATH": "/root",
   "VARIABLES": [
  {
    "NAME": "id",
    "TYPE": "NUMERIC",
    "PATH": "/root/id"
  },
  {
    "NAME": "name",
    "TYPE": "CHARACTER",
    "PATH": "/root/name",
    "CURRENT_LENGTH": 50,
    "LENGTH": 50
  },
  {
    "NAME": "html_url",
    "TYPE": "CHARACTER",
    "PATH": "/root/html_url",
    "CURRENT_LENGTH": 100,
    "LENGTH": 100
  },
  {
    "NAME": "language",
    "TYPE": "CHARACTER",
    "PATH": "/root/language",
    "CURRENT_LENGTH": 20,
    "LENGTH": 20
  },
  {
    "NAME": "description",
    "TYPE": "CHARACTER",
    "PATH": "/root/description",
    "CURRENT_LENGTH": 300,
    "LENGTH": 500
  },
  {
    "NAME": "created_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/created_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "updated_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/updated_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "pushed_at",
    "TYPE": "NUMERIC",
    "INFORMAT": [ "IS8601DT", 19, 0 ],
    "FORMAT": ["DATETIME", 20],
    "PATH": "/root/pushed_at",
    "CURRENT_LENGTH": 20
  },
  {
    "NAME": "size",
    "TYPE": "NUMERIC",
    "PATH": "/root/size"
  },
  {
    "NAME": "stars",
    "TYPE": "NUMERIC",
    "PATH": "/root/stargazers_count"
  },
  {
    "NAME": "forks",
    "TYPE": "NUMERIC",
    "PATH": "/root/forks"
  },
  {
    "NAME": "open_issues",
    "TYPE": "NUMERIC",
    "PATH": "/root/open_issues"
  }
   ]
 }
  ]
}
;
run;
 
/* GETREPOS: iterate through each "page" of repositories */
/* and collect the GitHub data                           */
/* Output: <account>_REPOS, a data set with all basic data  */
/*  about an account's public repositories          */
%macro getrepos;
 %do i = 1 %to &pages;
  proc http
   url="https://api.github.com/orgs/sassoftware/repos?page=&i."  
   method="GET"
   out=resp
  ;
  run;
 
  /* Use JSON engine with defined map to capture data */
  libname repos json map=repomap fileref=resp;
  data _repos&i.;
   set repos.root;
  run;
 %end;
 
 /* Concatenate all pages of data */
 data sassoftware_allrepos;
  set _repos:;
 run;
 
 /* delete intermediate repository data */
 proc datasets nolist nodetails;
  delete _repos:;
 quit;
%mend;
 
/* Run the macro */
%getrepos;

Creating a simple report

Finally, I want to create simple report listing of all of the repositories and their top-level stats. I'm using PROC SQL without a CREATE TABLE statement, which will create a simple ODS listing report for me. I use this approach instead of PROC PRINT because I transformed a couple of the columns in the same step. For example, I created a new variable with a fully formed HTML link, which ODS HTML will render as an active link in the browser. Here's a snapshot of the output, followed by the code.

samplereport

/* Best with ODS HTML output */
title "github.com/sassoftware (&acctname.): Repositories and stats";
title2 "ALL &repocount. repos, Data pulled with GitHub API as of &SYSDATE.";
title3 height=1 link="&accturl." "See &acctname. on GitHub";
proc sql;
 select 
  catt('<a href="',t1.html_url,'">',t1.name,"</a>") as Repository, 
 case 
  when length(t1.description)>50 then cat(substr(t1.description,1,49),'...')
  else t1.description
 end 
as Description,
 t1.language as Language,
 t1.created_at format=dtdate9. as Created, 
 t1.pushed_at format=dtdate9. as Last_Update, 
 t1.stars as Stars, 
 t1.forks as Forks, 
 t1.open_issues as Open_Issues
from sassoftware_allrepos t1
 order by t1.pushed_at desc;
quit;

Get the entire example

Not wanting to get too meta on you here, but I've placed the entire program on my own GitHub account. The program I've shared has a few modifications that make it easier to adapt for any organization or user on GitHub. As you play with this, keep in mind that the GitHub API is "rate limited" -- they allow only so many API calls from a single IP address in a certain period of time. That's to ensure that the APIs perform well for all users. You can use authenticated API calls to increase the rate-limit threshold for yourself, and I do that for my own production reporting process. But...that's a blog post for a different day.

Post a Comment

Learning SAS programming for R users

TL; DR

Free training from SAS: "SAS Programming for R Users." The schedule of Live Web offerings is here. If you prefer self-study, the complete course materials are on the SAS Software GitHub space and you can practice with the free SAS University Edition software.


The details: how R programmers can learn SAS for free

diagbeta As much as I would love for SAS customers to use SAS to the exclusion of everything else, that rarely happens. Every time I visit a SAS customer, I hear about the other non-SAS tools that they use alongside SAS and their integration points. The most popular of these include desktop tools such as Microsoft Excel, or enterprise databases from other vendors. But increasingly, I hear from users who dabble in open source tools such as Python and R, or who work with other teams that use those tools.

Programmers tend to favor the programming languages that they know. When you learn a new programming language, your experience is colored by inevitable comparisons with the languages you've already mastered. If you work with R coders who want to learn SAS, you should consider that they probably won't learn SAS the same way that you did.

A SAS programming course for experienced programmers

The traditional way to learn SAS begins with the DATA step, where you learn how to read files, how to write files, about the program data vector, and basically how the DATA step "thinks". Then you move on to the various procedures for descriptive stats, reporting, and maybe even some graphing. While this approach can make you productive with simple tasks quickly, to an R coder this might feel too much like "starting over." That's why R programmers (or even MATLAB or Stata users) need an approach that leverages what they already know to hit the ground running.

That's the thinking behind the new SAS Programming for R Users course. This course does not start with the basics about statistics or the importance of data prep -- the assumption is that you already know that. Instead, you'll get hands-on experience with SAS/IML -- a statistical matrix language that will certainly feel familiar to R users. You'll eventually get to the DATA step and other procedures, of course -- and these will open new worlds for you -- but you'll learn to be productive quickly using the skills you already have. (You can read more about the genesis of the course from its creator and main instructor, Jordan Bakerman.)

The course centers around classic and real statistical problems, from Bayesian logistic regression to the Monty Hall problem. If you don't know your statistics, you might feel that you're swimming in waters over your head. But if you're comfortable with the concepts, you should feel right at home. (If you're just beginning with statistics, SAS offers this different free e-learning course.)

The classic game show proof

The classic game show proof - click for code

"SAS Programming for R Users" also shows you how to use SAS and R together, submitting R code from within your SAS program. That's made possible by a special connection between SAS/IML and R -- something that SAS has supported for years.

This is a free instructor-led course that's offered in Live Web format. "Live Web" means that you connect from your desk at home or work, tune into the lecture and demos, and then practice your skills on a hosted classroom environment. And this course is free -- costing you only your time (5 half-day sessions). Check out the SAS Training site to see when the next offering might meet your schedule.

Find the course materials on GitHub, right now

What if you can't find a Live Web offering that meets your schedule? In the spirit of openness, the SAS Training team has published the complete course materials on GitHub. You'll find the course notes (over 600 pages), data sets, and over 80 SAS programs to support the course exercises. You can use the free SAS University Edition to try the course exercises yourself and practice with the software. (The only part that you can't practice is the "submit to R" lessons, because the SAS University Edition doesn't support the connection to R.)

Post a Comment

Reading data with the SAS JSON libname engine

JSON is the new XML. The number of SAS users who need to access JSON data has skyrocketed, thanks mainly to the proliferation of REST-based APIs and web services. Because JSON is structured data in text format, we've been able to offer simple parsing techniques that use DATA step and most recently PROC DS2. But finally*, with SAS 9.4 Maintenance 4, we have a built-in LIBNAME engine for JSON.

Simple JSON example: Who is in space right now?

Speaking of skyrocketing, I discovered a cool web service that reports who is in space right now (at least on the International Space Station). It's actually a perfect example of a REST API, because it does just that one thing and it's easily integrated into any process, including SAS. It returns a simple stream of data that can be easily mapped into a tabular structure. Here's my example code and results, which I produced with SAS 9.4 Maintenance 4.

filename resp temp;
 
/* Neat service from Open Notify project */
proc http 
 url="http://api.open-notify.org/astros.json"
 method= "GET"
 out=resp;
run;
 
/* Assign a JSON library to the HTTP response */
libname space JSON fileref=resp;
 
/* Print result, dropping automatic ordinal metadata */
title "Who is in space right now? (as of &sysdate)";
proc print data=space.people (drop=ordinal:);
run;

JSON who is in space
But what if your JSON data isn't so simple? JSON can represent information in nested structures that can be many layers deep. These cases require some additional mapping to transform the JSON representation to a rectangular data table that we can use for reporting and analytics.

JSON map example: Most recent topics from SAS Support Communities

In a previous post I shared a PROC DS2 program that uses the DS2 JSON package to call and parse our SAS Support Communities API. The parsing process is robust, but it requires quite a bit of fore knowledge about the structure and fields within the JSON payload. It also requires many lines of code to extract each field that I want.

Here's a revised pass that uses the JSON engine:

/* split URL for readability */
%let url1=http://communities.sas.com/kntur85557/restapi/vc/categories/id/bi/topics/recent;
%let url2=?restapi.response_format=json%str(&)restapi.response_style=-types,-null,view;
%let url3=%str(&)page_size=100;
%let fullurl=&url1.&url2.&url3;
 
filename topics temp;
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
/* Let the JSON engine do its thing */
libname posts JSON fileref=topics;
title "Automap of JSON data";
 
/* examine resulting tables/structure */
proc datasets lib=posts; quit;
proc print data=posts.alldata(obs=20); run;

Thanks to the many layers of data in the JSON response, here are the tables that SAS creates automatically.

json Auto tables
There are 12 tables that contain various components of the message data that I want, plus the ALLDATA member that contains everything in one linear table. ALLDATA is good for examining structure, but not for analysis. You can see that it's basically name-value pairs with no data types/formats assigned.

json ALLDATA
I could use DATA steps or PROC SQL to merge the various tables into a single denormalized table for my reporting purposes, but there is a better way: define and apply a JSON map for the libname engine to use.

To get started, I need to rerun my JSON libname assignment with the AUTOMAP option. This creates an external file with the JSON-formatted mapping that SAS generates automatically. In my example here, the file lands in the WORK directory with the name "top.map".

filename jmap "%sysfunc(GETOPTION(WORK))/top.map";
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts JSON fileref=topics map=jmap automap=create;

This generated map is quite long -- over 400 lines of JSON metadata. Here's a snippet of the file that describes a few fields in just one of the generated tables.

"DSNAME": "messages_message",
"TABLEPATH": "/root/response/messages/message",
"VARIABLES": [
{
  "NAME": "ordinal_messages",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages"
},
{
  "NAME": "ordinal_message",
  "TYPE": "ORDINAL",
  "PATH": "/root/response/messages/message"
},
{
  "NAME": "href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/href",
  "CURRENT_LENGTH": 19
},
{
  "NAME": "view_href",
  "TYPE": "CHARACTER",
  "PATH": "/root/response/messages/message/view_href",
  "CURRENT_LENGTH": 134
},

By using this map as a starting point, I can create a new map file -- one that is simpler, much smaller, and defines just the fields that I want. I can reference each field by its "path" in the JSON nested structure, and I can also specify the types and formats that I want in the final data.

In my new map, I eliminated many of the tables and fields and ended up with a file that was just about 60 lines long. I also applied sensible variable names, and I even specified SAS formats and informats to transform some columns during the import process. For example, instead of reading the message "datetime" field as a character string, I coerced the value into a numeric variable with a DATETIME format:

{
  "NAME": "datetime",
   "TYPE": "NUMERIC",
  "INFORMAT": [ "IS8601DT", 19, 0 ],
  "FORMAT": ["DATETIME", 20],
  "PATH": "/root/response/messages/message/post_time/_",
  "CURRENT_LENGTH": 8
},

I called my new map file 'minmap.map' and then re-issued the libname without the AUTOMAP option:

filename minmap 'c:\temp\minmap.map';
 
proc http
 url= "&fullurl."
 method="GET"
 out=topics;
run;
 
libname posts json fileref=topics map=minmap;
proc datasets lib=posts; quit;
 
data messages;
 set posts.messages;
run;

Here's a snapshot of the single data set as a result.

JSON final data
I think you'll agree that this result is much more usable than what my first pass produced. And the amount of code is much smaller and easier to maintain than any previous SAS-based process for reading JSON.

Here's the complete program in a public GitHub gist, including my custom JSON map.


* By the way, the JSON libname engine actually made its debut as part of SAS Visual Data Mining and Machine Learning, part of the SAS Viya platform. This is a good example of how work on the new SAS Viya platform continues to benefit the users of the SAS 9.4 architecture.

Post a Comment

Using the DATA step debugger in SAS Enterprise Guide

In my earlier post about WHERE and IF statements, I announced that the DATA step debugger has finally arrived in SAS Enterprise Guide. (I admit that I might have buried the lead in that post.) Let's use this post to talk about the new debugger and how it works.

First, let's address some important limitations. This tool is for debugging DATA step code. It can't be used to debug PROC SQL or PROC IML or SAS macro programs. Next, it can't be used to debug DATA steps that read data from CARDS or DATALINES. That's an unfortunate limitation, but it's a side effect of the way the DATA step "debug" mode works with client applications like SAS Enterprise Guide. (Workaround: load your data in a separate step, then debug your more complex DATA step logic in a subsequent step.)

Ye olde DATA step debugger

1993 called; they want their debugger back

1986 called; they want their debugger back.

If you've been around SAS programs for a while then you might remember the full-screen DATA step debugger in the SAS windowing environment. Introduced as production in SAS 6.09E (E="enhanced!"), it was basic but it did the job, relying on command-line processing to direct the debugger actions. It had only two windows: one for the source, and one for the "log", meaning the debugger console log. You could set breakpoints, variable watch conditions, examine variables and calculate values -- all with commands that you typed. (Even though I'm writing this in the past tense and it seems like I'm eulogizing, this debugger still lives on in Base SAS!)

The new DATA step debugger

The new debugging environment, introduced in SAS Enterprise Guide 7.13, has all of the features of its ancestor. And it's much more usable, with toolbars and windows that allow you to control its behavior. But keyboard junkies, don't worry -- that command line is still there too!

To activate the debugger, click the new "bug" toolbar icon in the program editor window. Once activated, you can click the bug in the left "gutter" of the program editor to begin a debug session. (You can also press F5 to debug the active DATA step.)
Starting the Debugger
Examine the screenshot below. You see the source window on top and the console window at the bottom, plus a convenient "watch" window that shows much of the content in the program data vector (PDV). That's all of the variables defined in the DATA step, plus automatic variables like _N_ and _ERROR_.

EG debugger
As you step through the DATA step, the line pointer in the source window advances to show the next line that will execute. You can use keyboard shortcuts (F10), the toolbar, or type a command ("step") to execute that line and advance. With every step, the watch window is updated with the latest values of the variables in your step. When a variable changes value, it's colored red. If you want to the DATA step to break processing when a certain variable changes value, check the Watch box for that variable.

Diving deeper with advanced debugging

Here's another example of debugging a different DATA step program. This program uses a BY statement and FIRST.variable logic, and you can see the additional automatic variables (FIRST.Make and LAST.Make) that the debugger is tracking. I also used END=eof on the SET statement; that adds the eof "flag" variable into the mix during run time.

egdebug_adv
In the Debug Console window you can see that I've issued some pretty fancy commands. The DATA step debugger allows you to set breakpoints that trigger on specific conditions. For example, "b 8 when (running_price > 10000)" will break on Line 8 when the value of running_price exceeds 10,000. "b 8 after 5" will break on Line 8 after 5 passes through the DATA step. You can set and clear line-specific breakpoints by clicking in the "gutter" (that left-hand margin next to the line numbers).

The "list _all_" command reveals the details about your open data sets and files. Here's what I see during the run of my program.

list command
Other commands let you SET variable values, EXAMINE variables, CALCulate expressions, GO and JUMP to specific lines, and more. The SAS documentation contains a complete reference for DATA step debugger commands, and most of those work exactly as documented, even within SAS Enterprise Guide. Here's the list:

This old-but-still relevant SAS Global Forum paper (written by a SAS user) also covers some useful debugging concepts in SAS which you can apply in this new environment.

A personal note: eating my words

I've presented "SAS Enterprise Guide for SAS programmers" as a topic in one form or another for the past 15 years. Every so often the topic of the DATA step debugger comes up, and I've said "don't look for it anytime soon." Knowing how the full-screen debugger is closely tied to the SAS windowing environment, I didn't hold out hope for a client application like SAS Enterprise Guide to get it working. Kudos to the R&D team! They creatively found a solution with the "/ldebug" option, an even more obscure debugging approach that works in SAS batch mode. I think this feature will be a tremendous productivity boost for experienced SAS programmers, and a useful learning and teaching tool for those just getting started with the DATA step.

Post a Comment

Debugging the difference between WHERE and IF in SAS

In the DATA step, the WHERE statement and the IF statement (a.k.a. the "subsetting IF") have similar functions. In many scenarios, they produce identical results. But new SAS programmers are taught early on that these two statements work very differently, and in important ways. To understand the differences, it helps to step through the program line-by-line to see how SAS "thinks." Fortunately, the new DATA step debugger in SAS Enterprise Guide 7.13 makes this really easy to do.

Difference between WHERE statement and IF statement

Here are the basics: the WHERE statement rules are determined when the DATA step is compiled. As the DATA step runs, incoming data (from a SET or MERGE statement) is filtered to just those records that match the WHERE condition, so only those records are ever loaded into the program data vector (PDV). This results in fewer iterations through DATA step code, but provides no opportunity for "dynamic" decisions about which records to examine.

In contrast, the IF statement is evaluated at run time, and operates on the variables after they are already in the PDV. When the IF condition is met, the current observation is kept for eventual output. Unlike the WHERE statement, the IF statement can examine values of new variables that are defined within the step.

Consider these two DATA steps. They produce identical output of 10 records, but the first one processes only those 10 records whereas the second step processes all 19 records from the input.

data results1;
  set sashelp.class;
  /* WHERE applied at compile time  */
  /* Processes ONLY matching obs    */
  where sex='M';
run;
 
data results2;
  set sashelp.class;
  /* IF evaluated at run time  */
  /* Processes EVERY obs       */
  if sex='M';
run;

Using the DATA step debugger to understand the DATA step

The new DATA step debugger in SAS Enterprise Guide makes it very easy to illustrate how WHERE is processed differently from IF. I loaded each of the above programs into my session, then clicked the new "bug" toolbar icon to activate the debugger. Once activated, you can click the bug in the left "gutter" of the program editor to begin a debug session. (You can also press F5 to debug the active DATA step.)
Starting the Debugger
Watch this first animation of a debugger session and see what you notice about the WHERE statement.

Debugger with WHERE
Watching this little movie, I see a few things that reveal some insights.

  • The statement pointer never lands on Line 5 (the WHERE statement). That's because the WHERE statement isn't processed at run time.
  • Even though the CLASS data contains 19 records, the value of the _N_ automatic variable reaches only 11, indicating that only 10 records were processed.
  • The variable watch window uses red to indicate when a variable changes between iterations. The Sex variable never changes from 'M', and thus stays colored black through the entire session.

Let's compare that to the IF statement. Study this animation and see what stands out to you.

Debugger with IF
Here's what I see:

  • The statement pointer begins at Line 2, then 5, and moves to Line 6 (the RUN statement) only when the record has made it past the IF condition and into the output. For each observation where Sex='F', the DATA step stops processing the record and the RUN statement is skipped.
  • In this program, _N_ reaches 20 -- that's because all 19 records in SASHELP.CLASS are processed and the step exits at the end-of-file condition.

Learning more about subsetting IF, IF-THEN, WHERE, and debugging

There are several good articles about how the IF statement works, on its own and in combination with IF-THEN-ELSE constructs. Here's a recent article by SAS trainer Charu Shankar. And here's another reference that's included in a piece about the Top 10 SAS coding efficiencies.

The new DATA step debugger in SAS Enterprise Guide opens a new world of understanding for beginner and veteran SAS programmers. It has all of the functions of the "classic" debugger available in the Base SAS windowing environment, but with a much friendlier user interface, keyboard shortcuts, and useful watch windows. In my next post, I've covered the debugging functions in more detail.

Post a Comment

Zodiac signs of US Presidents

Rick Wicklin showed us how to visualize the ages of US Presidents at the time of their inaugurations. That's a pretty relevant thing to do, as the age of the incoming president can indirectly influence aspects of the president's term, thanks to health and generational factors.

As part of his post, Rick supplied the complete data set for US Presidents and their birthdays. He challenged his readers to create their own interesting visualizations, and that's what I'm going to do here. I'm going to show you the distribution of US Presidents by their astrological signs.

Now, you might think that "your sign" is not as relevant of a factor as Age, and I certainly hope that you're correct about that. But past presidents have sought the advice of astrologers, and zodiac signs can influence the counsel such astrologers might offer. (Famously, Richard Nixon took advice from celebrity psychic Jeane Dixon. First Lady Nancy Reagan also sought her advice, and we know that Mrs. Reagan in turn influenced President Reagan.)

Like any good analyst, I mostly reused existing work to produce my results. First, I used the DATA step that Rick provided to create the data set of presidents and birthdays. Next, I reused my own work to create a SAS format that displays a zodiac sign for each date. And finally, I wrote write a tiny bit of PROC FREQ code to create my table and frequency plot.

data signs;
 /* So this column appears first */
 retain President;
 length sign 8;
 /* SIGN. format created earlier with PROC FORMAT */
 format sign sign.;
 set presidents (keep=President BirthDate InaugurationDate);
 /* convert birthday to our normalized SIGN date */
 sign = mdy(month(birthdate),day(birthdate),2000);
run;
 
ods graphics on;
proc freq data=signs order=freq;
tables sign / plots=freqplot;
run;

To keep things a bit fresh, I did all of this work in SAS University Edition using the Jupyter Notebook interface. Here's a glimpse of what it looks like:

procprintsigns
And here's the distribution you've all been waiting to see. When he takes office, Donald Trump will join George H. W. Bush and JFK in the Gemini column.

signspres
I've shared the Jupyter Notebook file as a public gist on GitHub. You can download and import into your own instance if you have SAS and Jupyter Notebook working together. (Having trouble rendering the notebook file? Try looking at it through the nbviewer service. That usually works.)

Post a Comment

The Copy Files task is going legit (and moving)

I've supplied dozens of custom tasks for SAS Enterprise Guide, but the Copy Files task is easily the most popular. The Copy Files task allows you to capture "file transfer" steps inside your process flow, so that you can automate any file upload and download operations between your PC and your SAS workspace session. It has proven to be an essential task for customers who move from using PC SAS to SAS Enterprise Guide. Many of you still need a method to copy data and results to and from your SAS session. When the SAS session is on a remote server, then this task fills that important gap.

Because "Copy files" is a custom task, you have to download the task package (from this blog) and follow a few steps to install the task into your SAS Enterprise Guide environment. When installed, the task can be found in the Tools → Add-In menu.

Copy Files task moves to the Tasks → Data menu

Copy Files in new menuThat's about to change with the next release: SAS Enterprise Guide v7.13. We're going to make an honest task out of "Copy Files," as it becomes an official feature in SAS Enterprise Guide. That's great news for a couple of reasons: no more custom install steps, and you can now get official support from SAS Tech Support when using it (although they would have always helped before now). The task works exactly the same way and if you have existing projects that use it, you don't need to make any changes. However, there is one change you need to know about: as an "official" task, it will appear in an official menu location. As of SAS Enterprise Guide 7.13, you'll find Copy Files in the Tasks → Data menu, near the bottom with some other utility-type tasks. And if you had previously installed it as a custom task, it will no longer appear in the Tools → Add-In menu.

SAS Enterprise Guide 7.13 is set to release within the next couple of weeks (near the end of November 2016), and it contains several exciting new features that I'll describe in this blog. Many of you will see it immediately when SAS Enterprise Guide prompts you to update. Stay tuned!

Post a Comment

Binge on this series: Fun with ODS Graphics

Fun with ODS GraphicsSAS Community member @tc (a.k.a. Ted Conway) has found a new toy: ODS Graphics. Using PROC SGPLOT and GTL (Graph Template Language), along with some creative data prep steps, Ted has created several fun examples that show off what you can do with a bit of creativity, some math knowledge, and open data.

And bonus -- since most of his examples work with SAS University Edition, it's easy for you to try them yourself. Here are some of my favorites.

Learn to draw a Jack-O-Lantern

Using the GIF output device and free data from Math-Aids.com, Ted shows how to use GTL (PROC TEMPLATE and PROC SGRENDER) to animate this Halloween icon.

learn to draw a Jack-O-Lantern

The United Polygons of America

Usually map charts with SAS require specialized procedures and map data, but here's a technique that can plot a stylized version of the USA and convey some interesting data. (You might have seen this one featured in a SAS Tech Report newsletter. Do you subscribe?)

United Polygons of America

A look at Katie Ledecky's dominance

Using a vector plot, Ted shows how this championship swimmer dominated her event during the summer games in Rio. This example contains a lot of text information too; and that's a cool trick in PROC SGPLOT with the AXISTABLE statement. Click on the image for a closer look.

Katie Ledecky dominates

Demonstrating the Bublé Sort

This example is nerdy on so many levels. It's a take on the Computer Science 101 concept of "bubble sort," an algorithm for placing a collection of items in a desired order. In this case, the items consist of Christmas songs recorded by Michael Bublé, that dreamy crooner from Canada.

See the songs sort things out
Ted posts these examples (and more) in the SAS/GRAPH and ODS Graphics section of SAS Support Communities. That's a great place to learn SAS graphing techniques, from simple to advanced, and to see what other practitioners are doing. Experts like Ted hang out there, and the SAS visualization developers often post answers to the tricky questions.

More from @tc

In addition to his community posts, Ted is an award-winning contributor to SAS Global Forum with some very popular presentations. Here are a few of his papers.

Post a Comment

Tip: How to close all data sets in SAS Enterprise Guide

Have you seen this error when running a program in SAS Enterprise Guide?

ERROR: You cannot open WORK.YOURDATA.DATA for output access with member-level 
control because WORK.YOURDATA.DATA is in use by you in resource environment IOM 
ROOT COMP ENV.

Or maybe:
ERROR: A lock is not available for LIB.YOURDATA.DATA.
NOTE: The SAS System stopped processing this step because of errors.

It has a simple cause: the data set that your program is trying to write (or rewrite) is open in the data viewer. With regard to this data file, your program is in contention with the SAS Enterprise Guide application.

Usually SAS Enterprise Guide closes all open data sets before running a program or task, and that's meant to help you avoid this error. But sometimes a data set file remains open for one reason or another, and the conflict results in the error message. Fortunately, there is a simple fix.

Close All data sets window

Select Tools->View Open Data Sets. The View Open Data Sets window shows the names of the data files that SAS Enterprise Guide has open. And it offers a convenient Close All button to clear the list. Closing the data doesn't affect the contents of the file or its place in your project. It simply removes the lock that SAS Enterprise Guide is holding on the file.

If you are running multiple SAS Enterprise Guide sessions, it's possible for one session to have a lock on a file that you're trying to update in another session. The View Open Data Sets window shows only those data sets from your current session, so be sure to check your other projects if you're multitasking.

The default behavior -- close all data before running SAS programs -- is controlled in Tools->Options->SAS Programs. If you don't want SAS Enterprise Guide to close your data windows, clear that checkbox. (It's difficult for me to imagine why you would do that...but hey, we have options for everything.)

Post a Comment

List the contents of your ZIP files using SAS

SAS programmers often resort to using the X command to list the contents of file directories and to process the contents of ZIP files. In centralized SAS environments, the X command is unavailable to most programmers. NOXCMD is the default setting for these environments (disallowing shell commands), and SAS admins are reluctant to change it.

Update 28Nov2016: I updated this article to remove the text about gz (gzip) file support. Currently, the FILENAME ZIP method works only with ZIP files -- on Windows and Unix.

In this article, I'll share a SAS program that can retrieve the contents of a file directory (all of the file names), and then also report on the contents of every ZIP file within that directory -- without using any shell commands. The program uses two lesser-known tricks to retrieve the information:

  1. The FILENAME statement can be applied to a directory, and then the DOPEN, DNUM, DREAD, and DCLOSE functions can be used to retrieve information about that directory. (Check SAS Note 45805 for a better example of just this - click the Full Code tab.)
  2. The FILENAME ZIP method (added in SAS 9.4) can retrieve the names of the files within a compressed archive (ZIP files). For more information, see all of my previous articles about the FILENAME ZIP access method.

I wrote the program as a SAS macro so that it should be easy to reuse. And I tried to be liberal with the comments, providing a view into my thinking and maybe some opportunities for improvement.

%macro listzipcontents (targdir=, outlist=);
  filename targdir "&targdir";
 
  /* Gather all ZIP files in a given folder                */
  /* Searches just one folder, not subfolders              */
  /* for a fancier example see                             */
  /* http://support.sas.com/kb/45/805.html (Full Code tab) */
  data _zipfiles;
    length fid 8;
    fid=dopen('targdir');
 
    if fid=0 then
      stop;
    memcount=dnum(fid);
 
    /* Save just the names ending in ZIP*/
    do i=1 to memcount;
      memname=dread(fid,i);
      /* combo of reverse and =: to match ending string */
      /* Looking for *.zip files */
      if (reverse(lowcase(trim(memname))) =: 'piz.') then
        output;
    end;
 
    rc=dclose(fid);
  run;
 
  filename targdir clear;
 
  /* get the memnames into macro vars */ 
  proc sql noprint;
    select memname into: zname1- from _zipfiles;
    %let zipcount=&sqlobs;
  quit;
 
  /* for all ZIP files, gather the members */
  %do i = 1 %to &zipcount;
    %put &targdir/&&zname&i;
    filename targzip ZIP "&targdir/&&zname&i";
 
    data _contents&i.(keep=zip memname);
      length zip $200 memname $200;
      zip="&targdir/&&zname&i";
      fid=dopen("targzip");
 
      if fid=0 then
        stop;
      memcount=dnum(fid);
 
      do i=1 to memcount;
        memname=dread(fid,i);
 
        /* save only full file names, not directory names */
        if (first(reverse(trim(memname))) ^='/') then
          output;
      end;
 
      rc=dclose(fid);
    run;
 
    filename targzip clear;
  %end;
 
  /* Combine the member names into a single data set        */
  /* the colon notation matches all files with "_contents" prefix */
  data &outlist.;
    set _contents:;
  run;
 
  /* cleanup temp files */
  proc datasets lib=work nodetails nolist;
    delete _contents:;
    delete _zipfiles;
  run;
 
%mend;

Use the macro like this:

%listzipcontents(targdir=c:\temp, 
 outlist=work.allfiles);

Here's an example of the output.
zip file contents within the target directory

Experience has taught me that savvy SAS programmers will scrutinize my example code and offer improvements. For example, they might notice my creative use of the REVERSE function and "=:" operator to simulate and "ends with" comparison function -- and then suggest something better. If I don't receive at least a few suggestions for improvements, I'll know that no one has read the post. I hope I'm not disappointed!

Post a Comment