I've used SAS with a bunch of different REST APIs: GitHub, Brightcove, Google Analytics, Lithium, LinkedIn, and more. For most of these I have to send user/password or "secret" application tokens to the web service so that it knows who I am and what data I can retrieve. I do not want to keep this secret information in my SAS program files -- that would be a bad idea. If my credentials were part of the program -- even if they were obfuscated and not stored in clear text -- then anyone who managed to get a copy of my program could run it. And they could gain access to my data, as if they were me.
I've written about this topic for SAS-related passwords. In this article, I'll share the approach that I use for API credentials and tokens.
REST APIs: Each service requires different types of secrets
My REST API services don't require just simple user ID and password combos. It depends on the API, but usually the information is in the form of one or more tokens that I've generated using the vendor's developer console, or perhaps that have been granted by an administrator.
For example, to access the Google Analytics API, I need three things: a client ID, a client secret token, and a valid "refresh" token. I can send these three items to the Google OAuth2 API, and in return I'll receive a live "access" token that I can use to request my data. I think of this like checking into a hotel. I show my ID and a credit card at the front desk, and in exchange I receive a room key. Just like my hotel room key, the access token doesn't last forever and cannot be reused on my next visit.
Other APIs are simpler and require just a single token that never expires. That's more like a house key -- it's mine to use forever, or until someone decides to change the locks.
Whether a static token or a token-for-token exchange, I don't want to leave these keys lying around for just anyone to find and use.
Hide your tokens in a file that only you can read
My simple approach is to store the token values in a text file within my home directory. Then, I change the permissions on the file such that only my account can read it. Whether I submit my program interactively (in SAS Enterprise Guide or SAS Studio) or as a scheduled batch job, it's running under my account. I'm showing the instructions here for UNIX/Linux, but Windows users can accomplish something similar with Windows permissions.
On Linux, I've used the chmod command to specify the octal value that says "only the owner can read/write." That's "chmod 600 filename". The "ls -l" command shows that this permissions mask has been applied.
chmod 600 ./.google_creds.csv ls -l ./.google_creds.csv > -rw------- 1 myid mygroup 184 Jan 15 12:41 ./.google_creds.csv
I stored my tokens in a standard CSV format because it's easy for SAS to read and it's easy for me to read if I ever need to change it.
Use INFILE to read the tokens dynamically
With this critical data now stored externally, and the file permissions in place, I can use SAS to read the credentials/tokens within my program and store the values in SAS macro variables. In the following SAS program, I assigned a macro variable to my user root folder. Since I might run this program on Linux or Windows, I used this trick to determine the proper path notation. I also used the &SYSUSERID macro variable to make my program more portable. If I want to supply this program to any colleagues (or to you!), the only thing that's needed is to create and store the token CSV files in the proper location.
/* My path is different for UNIX vs Windows */ %let authpath = %sysfunc(ifc(&SYSSCP. = WIN, \\netshare\root\u\&sysuserid., /u/&sysuserid.)); /* This should be a file that only YOU or trusted group members can read */ /* Use "chmod 0600 filename" in UNIX environment */ /* "dotfile" notation is convention for on UNIX for "hidden" */ filename auth "&authpath./.google_creds.csv"; /* Read in the secret account keys from another file */ data _null_; infile auth firstobs=2 dsd delimiter=',' termstr=crlf; length client_id $ 100 client_secret $ 30 refresh_token $ 60; input client_id client_secret refresh_token; call symputx('client_id',client_id); call symputx('client_secret',client_secret); call symputx('refresh_token',refresh_token); run;
When I run this code in my production job, I can see the result:
NOTE: The infile AUTH is: Filename=/u/myid/.google_creds.csv, Owner Name=myid,Group Name=mygroup, Access Permission=rw-------, Last Modified=Mon Jan 15 12:41:58 2018, File Size (bytes)=184 NOTE: 1 record was read from the infile AUTH. The minimum record length was 145. The maximum record length was 145. NOTE: DATA statement used (Total process time): real time 0.01 seconds user cpu time 0.01 seconds
For this example, my next step is to call the Google API to get my access token. I'll use the macro variables that my program created with CALL SYMPUTX to build the proper API call.
/* Call Google API to exchange the refresh token for an active access token */ %let oauth2=https://www.googleapis.com/oauth2/v4/token; filename rtoken temp; proc http method="POST" url="&oauth2.?client_id=&client_id.%str(&)client_secret=&client_secret.%str(&)grant_type=refresh_token%str(&)refresh_token=&refresh_token." out=rtoken; run;
See the full explanation of this Google Analytics example in this article.
Great tip! I'd only add that this can expose sensitive credentials via the log, and so one more step is necessary - that is, to ensure your program is executed via a `%include` with the `nosource2` option in effect.
Absolutely -- if you're running this code in a shared environment where your logs might be seen by others, then that's an important consideration.
This storing on secure OS location is only the first step. The issues to be solved are:
- a regular operational should not run by some personal account, use a dedicated limited service account for that
- Adminitrators on the OS level could see and copy all plain text passwords in your secure location.
Remember: root has all acces without logging
1+ for one thing you are fine with this the approach is a generic clear understandable one without any dependencies
+ close the logging loophole while reviewing code there are manu sas tips to do that
Using SAS code it is rather easy to improve. What you could do:
- Store the real passwords in an encrypted SAS dataset using a password on that dataset.
In reality that password is used for encryption and is not related to real access security.
- use the password stored at the OS for accessing that SAS dataset.
- Have the text read from a record out of that SAS dataset being put in the real source location
No need for intermediate macrovars or whatever as sas functions exist to read one record- column as a macro result.
What is additional achieved:.
The real user / password / token access or whatever is protected by
2+ SAS processing is needed to read that sas dataset. Ordinary OS admins normally don''t know anything about that
3+ For the service account some shared access is allowed by a group of admins. No dependency on a single person,
You know what Jaap? I actually agree with you on all of these points. (I know, it's weird for me too...)
I agree that production jobs should be run by a service account. I should establish that for some of my work, but then it turns into an IT project and as you know, it adds another layer of management. For important jobs that's justified.
The PW-protected data set? That does significantly obscure the credentials and adds another level of indirection, but in the end anyone with the data set password could replay the access to these secrets. It makes the process more difficult to untangle for those not in the know, but it doesn't add true security.
Pingback: Using SAS to access Google Analytics APIs - The SAS Dummy
Pingback: Authentication to SAS Viya: a couple of approaches - SAS Users
Pingback: Using SAS with Microsoft OneDrive and SharePoint Online - The SAS Dummy
Pingback: Using SAS with Microsoft 365 (OneDrive, Teams, and SharePoint) - The SAS Dummy