Update 02Dec2016: Beginning with SAS 9.4 Maintenance 4, there is now a JSON libname engine. Read this new article to learn more -- you might prefer it to using DS2 for this task!
Thanks to the proliferation of cloud services and REST-based APIs, SAS users have been making use of PROC HTTP calls (to query these web services) and some creative DATA step or PROC GROOVY code to process the JSON results. Such methods get the job done (JSON is simply text, after all), but they aren't as robust as an official JSON parser. JSON is simple: it's a series of name-value pairs that represent an object in JavaScript. But these pairs can be nested within one another, so in order to parse the result you need to know about the object structure. A parser helps with the process, but you still need to know the semantics of any JSON response.
SAS 9.4 introduced PROC JSON, which allows you to create JSON output from a data set. But it wasn't until SAS 9.4 Maintenance 3 that we have a built-in method to parse JSON content. This method was added as a DS2 package: the JSON package.
I created an example of the method working -- using an API that powers our SAS Support Communities! The example queries communities.sas.com for the most recent posts to the SAS Programming category. Here's a small excerpt of the JSON response.
"post_time": "2015-09-28T16:29:05+00:00", "views": { "count": 1 }, "subject": "Re: How to code for the consecutive values", "author": { "href": "\/users\/id\/13884", "login": "ballardw"
Notice that some items, such as post_time, are simple one-level values. But other items, such as views or author, require a deeper dive to retrieve the value of interest ("count" for views, and "login" for author). The DS2 JSON parser can help you to navigate to those values without you needing to know how many braces or colons or commas are in your way.
Here is an example of the result: a series plot from PROC SGPLOT and a one-way frequency analysis from PROC FREQ. The program also produces a detailed listing of the messages, the topic content, and the datetime stamp.
This is my first real DS2 program, so I'm open to feedback. I already know of a couple of improvements I should make, but I want to share it now as I think it's good enough to help others who are looking to do something similar.
The program requires SAS 9.4 Maintenance 3. All of the code runs using just Base SAS procedures.
/* DS2 program that uses a REST-based API */ /* Uses http package for API calls */ /* and the JSON package (new in 9.4m3) */ /* to parse the result. */ proc ds2; data messages (overwrite=yes); /* Global package references */ dcl package json j(); /* Keeping these variables for output */ dcl double post_date having format datetime20.; dcl int views; dcl nvarchar(128) subject author board; /* these are temp variables */ dcl varchar(65534) character set utf8 response; dcl int rc; drop response rc; method parseMessages(); dcl int tokenType parseFlags; dcl nvarchar(128) token; rc=0; * iterate over all message entries; do while (rc=0); j.getNextToken( rc, token, tokenType, parseFlags); * subject line; if (token eq 'subject') then do; j.getNextToken( rc, token, tokenType, parseFlags); subject=token; end; * board URL, nested in an href label; if (token eq 'board') then do; do while (token ne 'href'); j.getNextToken( rc, token, tokenType, parseFlags ); end; j.getNextToken( rc, token, tokenType, parseFlags ); board=token; end; * number of views (int), nested in a count label ; if (token eq 'views') then do; do while (token ne 'count'); j.getNextToken( rc, token, tokenType, parseFlags ); end; j.getNextToken( rc, token, tokenType, parseFlags ); views=inputn(token,'5.'); end; * date-time of message (input/convert to SAS date) ; * format from API: 2015-09-28T10:16:01+00:00 ; if (token eq 'post_time') then do; j.getNextToken( rc, token, tokenType, parseFlags ); post_date=inputn(token,'anydtdtm26.'); end; * user name of author, nested in a login label; if (token eq 'author') then do; do while (token ne 'login'); j.getNextToken( rc, token, tokenType, parseFlags ); end; * get the author login (username) value; j.getNextToken( rc, token, tokenType, parseFlags ); author=token; output; end; end; return; end; method init(); dcl package http webQuery(); dcl int rc tokenType parseFlags; dcl nvarchar(128) token; dcl integer i rc; /* create a GET call to the API */ /* 'sas_programming' covers all SAS programming topics from communities */ webQuery.createGetMethod( 'http://communities.sas.com/kntur85557/' || 'restapi/vc/categories/id/sas_programming/posts/recent' || '?restapi.response_format=json' || '&restapi.response_style=-types,-null&page_size=100'); /* execute the GET */ webQuery.executeMethod(); /* retrieve the response body as a string */ webQuery.getResponseBodyAsString(response, rc); rc = j.createParser( response ); do while (rc = 0); j.getNextToken( rc, token, tokenType, parseFlags); if (token = 'message') then parseMessages(); end; end; method term(); rc = j.destroyParser(); end; enddata; run; quit; /* Add some basic reporting */ proc freq data=messages noprint; format post_date datetime11.; table post_date / out=message_times; run; ods graphics / width=2000 height=600; title '100 recent message contributions in SAS Programming'; title2 'Time in GMT'; proc sgplot data=message_times; series x=post_date y=count; xaxis minor label='Time created'; yaxis label='Messages' grid; run; title 'Board frequency for recent 100 messages'; proc freq data=messages order=freq; table board; run; title 'Detailed listing of messages'; proc print data=messages; run; title; |
I also shared this program on the SAS Support Communities as a discussion topic. If you want to contribute to the effort, please leave me a reply with your suggestions and improvements!
16 Comments
Wonderful piece! I guessed SAS would come out a "libname json" way to deal with JSON files. This method is just elegant!
Thanks JT! I hope you'll blog about your experiences if you use this.
Are there any other use examples of parsing JSON? And this is only available in 9.4 maintenance 3? At 9.4 (TS1M1).
Currently use groovy to parse and is at times ugly to get data out of json file.
Dave,
Yes, the JSON package was added in 9.4m3. Simple JSON is easy to parse with DATA step and SCAN functions, but more complex JSON really needs the parser. Do you want to supply an example JSON excerpt and we'll work out the parsing? You could post on communities.sas.com and I'm sure that someone would jump in to offer an example.
Is it possible to use the HTTP package over a proxy? I can't seem to see an option in the documentation, whereas it's possible using this:
filename url ... proxyhost= proxyport=
Mark, I don't see a way. PROC HTTP has more robust syntax. I'm checking with the developer for the DS2 answer.
Hi Chris, any updates from the developers? I'm struggling because the HTTP request returns a single line of JSON that can be many megabytes. I can drop this into a file using PROC HTTP, but then DS2 cannot read from files in the conventional sense (as far as I'm aware) and conventional data-step does not easily parse out JSON like DS2 does.
Hi Mark, no updates to share. But you could pose the question in the SAS Support Communities and "at-mention" @SASJedi (Mark Jordan) to see if he can provide a creative solution. He wrote the book on DS2!
Thanks Chris, this is a great post. Great to see that importing JSON has finally been addressed properly (and for once I actually have the latest version of SAS and can make use of it). This will be my first foray into DS2 as well so that should be fun =)
Enjoy!
So I've finished my first project with the JSON package and my final thoughts are that this is a welcome addition to the SAS toolset, but it still leaves most of the heavy lifting to be done by the implementer. This is great if you need a lot of control over the end result, or need to focus on performance. Not so great if you just want something more along the lines of the SAS XML Mapper which allows you to use an interface to get your hands on usable data faster, and without coding (although it is very slow if you are processing thousands of XML files). The technical skills required also means that a lot of people with limited coding ability may struggle to make use of it. Still, I'm not one to look a gift horse in the mouth. Can't wait to see what the next iteration of SAS/JSON integration looks like.
Robert - thanks for the feedback! I agree that the DS2 parser is great for people who need the fine control. You can see a hint of the SAS-JSON future in SAS Viya -- check the doc here. This capability is also planned for a future iteration of SAS 9.4.
Thanks Chris - that looks very promising indeed. Especially the AUTOMAP option =)
Pingback: Reading data with the SAS JSON libname engine - The SAS Dummy
Hello .... I am successfully using the json package.
This sample code, and pretty much all of the documentation that I can find, repeat the same thing. and do not answer a fundamental question.
rc = j.createParser( response );
do while (rc = 0);
j.getNextToken( rc, token, tokenType, parseFlags);
if (token = 'message') then
parseMessages();
end;
end;
method term();
rc = j.destroyParser();
a) I understand that you MUST call createParser, every time that your response changes.
(aka, I make 5000 calls, getting 5000 different responses)
b) BUT - do you need to call destroyParser for every new response? or just once at the end to "cleanup"?
I guess that I can try, but am lookinig for a definitely answer, and cant find one.
THANKS
PC
I'd say if you call createParser each time, you should call destroyParser as well. DS2 will clean up at the end of the PROC regardless, but for memory management while the procedure is running, I think that matching create/destroy calls are the best practice.