SAS functions to encode and decode data for the Web

0

Sir Tim Berners-Lee is famous for inventing the World Wide Web and for the construction of URLs -- a piece of syntax that every 8-year-old is now familiar with. According to the lore, when Sir Tim invented URLs he did not imagine that Internet surfers of all ages and backgrounds would be expected to type these cryptic schemes into their browser windows...but here we are. Today's young people can navigate the Web with URLs in the same way that migratory birds can find their way South for the winter -- by pure instinct.

URLs are syntax, the language of Web navigation. And HTML is syntax too, the language of the Web page. Each of these represent instructions to our web browser, telling it to how to go somewhere or display something. As we navigate the web programmatically, it is sometimes necessary to encode information in a URL or in HTML in a way that it won't be mistakenly interpreted as syntax. And of course, SAS has some functions for that.

Here's a table with links to the SAS documentation for these functions. You'll find they are intuitively named. The names are the same or similar to corresponding functions in other programming languages -- you can get only so creative with basic functions like these.

HTMLDECODE Function   Decodes a string that contains HTML numeric character references or HTML character entity references, and returns the decoded string.
HTMLENCODE Function   Encodes characters using HTML character entity references, and returns the encoded string.
URLDECODE Function   Returns a string that was decoded using the URL escape syntax.
URLENCODE Function   Returns a string that was encoded using the URL escape syntax.

Of these four functions, I use URLENCODE the most often. I need it when I need to pass syntax for instructions to a REST API, in which the API call itself is a URL. Here's an example from my paper about accessing Google Analytics with SAS:

%let workdate='01Oct2017'd;
%let urldate=%sysfunc(putn(&workdate.,yymmdd10.));
%let metrics=%sysfunc(urlencode(%str(ga:pageviews,ga:sessions)));
%let id=%sysfunc(urlencode(%str(ga:XXXXXX)));
filename garesp temp;
proc http
  url="https://www.googleapis.com/analytics/v3/data/ga?ids=&id.%str(&)start-date=&urldate.%str(&)end-date=&urldate.%str(&)metrics=&metrics.%str(&)max-results=20000"
  method="GET" out=garesp;
  headers 
    "Authorization"="Bearer &access_token."
    "client-id:"="&client_id.";
 run;

In the previous example, I need to include "ga:pageviews,ga:sessions" as an instruction on the Google Analytics API, but the colon character has special meaning within the URL syntax. I need to "escape" the colon character so that the URL parser ignores it and simply passes the value to the API. The URLENCODE function converts this segment to "ga%3Apageviews,ga%3Asessions". The colon has been replaced by its hexadecimal code, set off by a percent sign: %3A.

An aside: the percent (%) and ampersand (&) characters have special meaning in URLs. They also have special meaning in the SAS macro language. We can claim that the SAS macro language preceded URL syntax by decades, but there are only so many characters on the keyboard that syntax designers can use to set off code instructions. I use the %str() macro function to tell SAS to process certain URL segments as literals and don't allow the macro parser to have a turn at them. This helps my program to avoid warnings like: "WARNING: Apparent symbolic reference END not resolved."

HTMLENCODE is useful when you need to represent HTML syntax in your output, but prevent the web browser from interpreting the HTML as code. Here's a simple example.

filename out temp;
ods html5 file=out;
data link;
  site = "sas.com";
  link = "<a href='https://www.sas.com'>sas.com</a>";
  code_as = htmlencode(link);
run;
 
proc print data=link;
run;
ods html5 close;

When produced using the HTML5 destination, the link variable is formatted as a live link, while the code_as variable shows the syntax that went into it.

As you might expect, the URLDECODE function "unescapes" the URL hex characters and restores the original URL syntax. HTMLDECODE does the same for HTML content. If you are writing code that implements an API endpoint in SAS (as you might do with a SAS stored process on the back end of a web service), you'll find these functions useful to unpack the information that was encoded on an API call.

HTMLENCODE and URLENCODE are not interchangeable. More than once, I have written programs that mistakenly use HTMLENCODE when it was URLENCODE that was needed. Those mistakes can be tricky to debug, so pay attention!

Cover image by Fabio Lanari, Internet2, CC BY-SA 4.0

Share

About Author

Chris Hemedinger

Director, SAS User Engagement

+Chris Hemedinger is the Director of SAS User Engagement, which includes our SAS Communities and SAS User Groups. Since 1993, Chris has worked for SAS as an author, a software developer, an R&D manager and a consultant. Inexplicably, Chris is still coasting on the limited fame he earned as an author of SAS For Dummies

Comments are closed.

Back to Top