Boaty McBoatface is on the run

I know what you're thinking: two "Boaty McBoatface" articles within two weeks? And we're past April Fool's Day?

But since I posted my original analysis about the "Name our ship" phenomenon that's happening in the UK right now, a new contender has appeared: Poppy-Mai.

The cause of Poppy-Mai, a critically ill infant who has captured the imagination of many British citizens (and indeed, of the world), has made a very large dent in the lead that Boaty McBoatface holds.

poppymai
Yes, "Boaty" still has a-better-than 4:1 lead. But that's a lot closer than the 10:1 lead (over "Henry Worsley") from just over a week ago. Check out the box plot now: you can actually make out a few more dots. Voting is open for another 10 days -- and as we have seen, a lot can happen in that time.

poppybox
As I take this second look at the submissions (now almost 6300) and voting data (almost 350,000 votes cast), I've found a few more entries that made me chuckle. Some of them struck me by their word play, and others cater to my nerdy sensibilities. Here they are (capitalization retained):

While I'm on this topic, I want to give a shout-out to regex101, the online regular expression tester. I was able to develop and test my regular expressions before dropping them into a PRXPARSE function call. I found that I had to adjust my regular expression to cast a wider net for valid titles from the names submissions data. Previously, I wasn't capturing all of the punctuation. While that's probably because I didn't expect punctuation to be part of a ship's name, that assumption doesn't stop people from suggesting and voting on such names. My new regex match:

  title_regex = prxparse("/\'title\':\s?\""([a-zA-Z0-9\'\.\-\_\#\s\$\%\&\(\)\@\!]+)/");

I could probably optimize by specifying an exception pattern instead of an inclusion pattern...but this isn't the sort of project where I worry about that.

Will I write about Boaty McBoatface again? What will my next Boaty article reveal? Stay tuned!

Post a Comment

And it's Boaty McBoatface by an order of magnitude

In a voting contest, is it possible for a huge population to get behind a ridiculous candidate with such force that no other contestant can possibly catch up? The answer is: Yes.

Just ask the folks at NERC, the environmental research organization in the UK. They are commissioning a new vessel for polar research, and they decided to crowdsource the naming process. Anyone in the world is welcome to visit their NameOurShip web site and suggest a name or vote on an existing name submission.

As of today, the leading name is "RRS Boaty McBoatface." ("RRS" is standard prefix for a Royal Research Ship.) This wonderfully creative name is winning the race by more than just a little bit: it has 10 times the number of votes as the next highest vote getter, "RRS Henry Worsley".

I wondered whether the raw data for this poll might be available, and I was pleased to find it embedded in the web page that shows the current entries. The raw data is in JSON format, embedded in the source of the HTML page. I saved the web page source to my local machine, copied out just the JSON line with the submissions data, then used SAS to parse the results. Here's my code:

filename records "c:\projects\votedata.txt";
 
data votes (keep=title likes);
 length likes 8;
 format likes comma20.;
 label likes="Votes";
 length len 8;
 infile records;
  if _n_ = 1 then
    do;
      retain likes_regex title_regex;
      likes_regex = prxparse("/\'likes\'\:\s?([0-9]*)/");
      title_regex = prxparse("/\'title\':\s?\""([a-zA-Z0-9\'\s]+)/");
    end;
 input;
 
 position = prxmatch(likes_regex,_infile_);
  if (position ^= 0) then
    do;
      call prxposn(likes_regex, 1, start, len);
      likes = substr(_infile_,start,len);
    end;
 start=0; len=0;
 
 position = prxmatch(title_regex,_infile_);
  if (position ^= 0) then
    do;
      call prxposn(title_regex, 1, start, len);
      title = substr(_infile_,start,len);
    end;
run;

With the data in SAS, I used PROC FREQ to show the current tally:

title "Vote tally for NERC's Name Our Ship campaign";
proc freq data=votes order=freq;
table title;
weight likes;
run;

boatfacefreq
The numbers are compelling: good ol' Boaty Mac has over 42% of the nearly 200,000 votes. The arguably more-respectable "Henry Worsley" entry is tracking at just 4%. I'm not an expert on polling and sample sizes, but even I can tell that Boaty McBoatface is going to be tough to beat.

To drive the point home a bit more, let's look at a box plot of the votes distribution.

title "Distribution of votes for ALL submissions";
proc sgplot data=votes;
hbox likes;
xaxis valueattrs=(size=12pt);
run;

In this output, we have a clear outlier:
hboxall
If we exclude Boaty, then it shows a slightly closer race among the other runners up (which include some good serious entries, plus some whimsical entries, such as "Boatimus Prime"):

title "Distribution of votes for ALL submissions except Boaty McBoatface";
proc sgplot data=votes(where=(title^="Boaty McBoatface"));
hbox likes;
xaxis valueattrs=(size=12pt);
run;

hboxtrim
See the difference between the automatic axis values between the two graphs? The tick marks show 80,000 vs. 8,000 as the top values.

Digging further, I wondered whether there were some recurring themes in the entries. I decided to calculate word frequencies using a technique I found on our SAS Support Communities (thanks to Cynthia Zender for sharing):

/* Tally the words across all submissions */
data wdcount(keep=word);
    set votes;
    i = 1;
    origword = scan(title,i);
    word = compress(lowcase(origword),'?');
    wordord = i;
    do until (origword = ' ');
        /* exclude the most common words */
        if word not in ('a','the','of','and') then output;
        i + 1;
        wordord = i;
        origword = scan(title,i);
        word = compress(lowcase(origword),'?');
    end;
run;
 
proc sql;
   create table work.wordcounts as 
   select t1.word, 
          /* count_of_word */
            (count(t1.word)) as word_count
      from work.wdcount t1
      group by t1.word
      order by word_count desc;
quit;
title "Frequently occurring words in boat name submissions";
proc print data=wordcounts(obs=25);
run;

The top words evoke the northern, cold nature of the boat's mission. Here are the top 25 words and their counts:

  1    polar         352 
  2    ice           193 
  3    explorer      110 
  4    arctic         86 
  5    red            69 
  6    sir            55 
  7    john           54 
  8    lady           46 
  9    sea            42 
 10    ocean          42 
 11    scott          41 
 12    bear           39 
 13    aurora         38 
 14    artic          37 
 15    queen          37 
 16    captain        36 
 17    james          36 
 18    endeavour      35 
 19    william        35 
 20    star           34 
 21    spirit         34 
 22    new            26 
 23    antarctic      26 
 24    boat           25 
 25    cold           25

I don't know when voting closes, so maybe whimsy will yet be outvoted by a more serious entry. Or maybe NERC will exercise their right to "take this under advisement" and set a certain standard for the finalist names. Whatever the outcome, I'm sure we haven't heard the last of Boaty...

Post a Comment

Add files to a ZIP archive with FILENAME ZIP

In previous articles, I've shared tips about how you can work with SAS and ZIP files without requiring an external tool like WinZip, gzip, or 7-Zip. I've covered:

But a customer approached me the other day with one scenario I missed: how to add SAS data sets to an existing ZIP file. It's a variation of a tip that I've already shared, but with two differences. First, in order to add a data set to a ZIP file, you have to know its physical filename -- not just the LIBNAME.MEMBER reference that you use in SAS procedure steps. And second, I had not shown how to add a new file to an existing ZIP archive -- though it turns out that's pretty simple.

Find the file name for a SAS data set

There are several ways to do this. For my approach, I used the output from PROC CONTENTS. Notice that I had to capture the ODS output (not the OUT= data set) to grab the file name. I wrapped it in a macro for easy reuse. And since I ultimately need a SAS fileref to map to the path, I've assigned one (data_fn) in my macro.

/* macro to assign a fileref to a SAS data set in a Base library */
%macro assignFilerefToDataset(_dataset_name);
    %local outDsName;
    ods output EngineHost=File;
    proc contents data=&_dataset_name.;
    run;
    proc sql noprint;
        select cValue1 into: outDsName 
            from work.file where Label1="Filename";
    quit;
    filename data_fn "&outDsName.";
%mend;

How to add a new member to a ZIP file

Now that I have the source file, I need to designate a destination file in a ZIP archive. The FILENAME ZIP method will create a new ZIP file if one does not yet exist, or it can add to an existing ZIP. To ensure I'm starting from scratch, I assign a simple fileref to my target destination and then delete the file.

/* Assign the fileref - basic file method */
filename projzip "&projectDir./project.zip";
/* Start with a clean slate - delete ZIP if it exists */
data _null_;
    rc=fdelete('projzip');
run;

To create a new ZIP file and designate a path and file name within it, I used the FILENAME ZIP method with the MEMBER= option. Note that I specified the "data/" subfolder in the MEMBER= value; this will place the file into a named subfolder within the archive.

/* Use FILENAME ZIP to add a new member -- CLASS */
/* Put it in the data subfolder */
filename addfile zip "&projectDir./project.zip" 
    member='data/class.sas7bdat';

Then finally, I need to actually "copy" the file into the archive. I do this by streaming the source file into the target fileref byte-by-byte:

/* byte-by-byte copy */
/* "copies" the new file into the ZIP archive */
data _null_;
    infile data_fn recfm=n;
    file addfile recfm=n;
    input byte $char1. @;
    put  byte $char1. @;
run;
 
filename addfile clear;

That's it! I now have a ZIP file with one member entry. Now I can "press repeat" to add a second entry:

%assignFilerefToDataset(sashelp.cars);
/* Use FILENAME ZIP to add a new member -- CARS */
/* Put it in the data subfolder */
filename addfile zip "&projectDir./project.zip" 
    member='data/cars.sas7bdat';
/* byte-by-byte copy */
/* "copies" the new file into the ZIP archive */
data _null_;
    infile data_fn recfm=n;
    file addfile recfm=n;
    input byte $char1. @;
    put  byte $char1. @;
run;
 
filename addfile clear;

Optional: Report on the ZIP file contents

If I want to report on the total contents of the ZIP file now, here's a DATA step and PROC CONTENTS step that does the job:

/* OPTIONAL for reporting */
/* Report on the contents of the ZIP file */
/* Assign a fileref wth the ZIP method */
filename inzip zip "&projectDir./project.zip";
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname);
    length memname $200;
    fid=dopen("inzip");
    if fid=0 then
        stop;
    memcount=dnum(fid);
    do i=1 to memcount;
        memname=dread(fid,i);
        output;
    end;
    rc=dclose(fid);
run;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

Result:

Files in the ZIP file 

memname
---------------------
data/class.sas7bdat
data/cars.sas7bdat 
N = 2

I hope that this helps to make the FILENAME ZIP method more useful to those who want to try it out. I'm sure that there will be more scenarios that people will ask about; someday, if I write enough blog posts, I'll have it all covered!

Sample program: You can view/download the entire SAS program (containing the snippets I've featured and more) from my GitHub profile.

Post a Comment

The zoomiest new feature in SAS Enterprise Guide 7.12

Have you ever been in a meeting in which a presenter is showing content on a web page -- but the audience can't read it because it's too small? Then a guy sitting in the back of the room yells, "Control plus!". Because, as we all know (right?), "Ctrl+" is the universal key combination that zooms your browser content.

I'm that guy -- the one who is shouting out the key combos. Every time. And as it turns out, we don't all know about this handy way to magnify the content on a page. And do you know what else works? Holding down the Ctrl key while sliding the mouse wheel. That trick also works in Microsoft Office products like Excel, Word, and even Outlook e-mail.

With the latest release of SAS Enterprise Guide (version 7.12, released last week), you can now use these ubiquitous magical key combinations to zoom your SAS content: HTML results, process flow, data records, and even your SAS program code. In my lucky position as a SAS insider, I've been using this for months and it's absolutely my favorite new thing. Have a lengthy SAS program? Here's a fun thing to do: zoom the program editor out to 10% to see the shape of your code. Then compare with those from your friends and draw sweeping conclusions about each other, Rorschach-test style.

Here's a few animated screen shots from SAS Enterprise Guide 7.12, showing the Ctrl+ zooming in action for code, data, and HTML results:

Zoom in close to your code

Zoom in close to your code

See your data near and far

See your data near and far

Close in on your HTML results

Close in on your HTML results

Post a Comment

A custom task to list and stop your SAS sessions

Last week I described how to use PROC IOMOPERATE to list the active SAS sessions that have been spawned in your SAS environment. I promised that I would share a custom task that simplifies the technique. Today I'm sharing that task with you.

How to get the SAS Spawned Processes task

You can download the task from this SAS communities topic, where I included it as an attachment. The instructions for installation are standard for any custom task; the details are included in the README file that is part of the task package.

You can also view and pull the source code for the task from my GitHub repository. I built it using Microsoft .NET and C#.

How to use the SAS Spawned Processes task

Once you have the task installed, you can access it from the Tools->Add-In menu in SAS Enterprise Guide. (By the way, the task should also work in the SAS Add-In for Microsoft Office -- though the installation instructions are a little different.)

The task works by using PROC IOMOPERATE to connect to the SAS Object Spawner. You'll need to provide the connection information (host and port) plus the user/password for an account that has the appropriate permissions (usually a SAS admin account). Note that the port value is that of the Object Spawner operator port (by default, 8581) and not the SAS Metadata Server.

spawnedprocesses
The task shows a list of active SAS processes. Of course, you're using a SAS process to even run the task, so your active process is shown with a yellow highlight. You can select any of the processes in the list and select End Process to stop it. You can drill into more detail for any selected process with the Show Details button. Here's an example of more process details:

processprops
Did you try the task? How did it work for you? Let me know here or in the SAS communities.

Custom task features within this example

If you're professionally interested in how to build custom tasks, this example shows several techniques that implement common requirements. Use the source code as a reference to review how these are built (and of course you can always refer to my custom tasks book for more guidance).

  • Submit a SAS program in the background with the SasSubmitter class. There are two examples of this in the task. The first example is an asynchronous submit to get the list of processes, where control returns to the UI and you have the option to cancel if it takes too long. With an asynch submit, there are some slightly tricky threading maneuvers you need to complete to show the results in the task. The second example uses a synchronous submit (SubmitSasProgramAndWait) to stop a selected SAS process.
  • Read a SAS data set. The SAS program that retrieves a list of processes places that result in a SAS data set. This task uses the SAS OLE DB provider to open the data set and read the fields in each row, so it can populate the list view within the task.
  • Detect errors and show the SAS log. If the SAS programs used by the task generate any errors (for example, if you supply the wrong credentials), the task uses a simple control (SAS.Tasks.Toolkit.Controls.SASLogViewDialog) to show the SAS log -- color-coded so the error is easy to spot.
  • Retrieve the value of a SAS macro variable by using SasServer.GetSasMacroValue("SYSJOBID"). This pulls the process ID for your active SAS session, so I can compare it to those retrieved by PROC IOMOPERATE. That's how I know which list item to highlight in yellow.
  • Save and restore settings between uses. Entering credentials is a drag, so the task uses a helper class (SAS.Tasks.Toolkit.Helpers.TaskUserSettings) to save your host/port/user information to a local file in your Windows profile. When you use the task again, the saved values are placed into the fields for you. I don't save the password -- I'm sure that I'd get complaints if I did that, even if I encoded it.
Post a Comment

SAS knows it's a leap year. Do you?

Leap year questions come up all of the time in computing, but if there is any true season for it, it's now. The end of February is approaching and developers wonder: does my process know that it's a leap year, and will it behave properly?

People often ask how to use SAS to calculate the leap years. The complicated answer is:

  • Check whether the year is divisible by 4 (MOD function)
  • But add exceptions when divisible by 100
  • Yeah...except when it's also divisible by 400.

The simple answer is: ask SAS. You can create a SAS date value with the MDY function. Feb 29 is a valid date for leap years; in off years, MDY returns a missing value.

data leap_years(keep=year);
  length date 8;
  do year=2000 to 2200;
    /* MISSING when Feb 29 not a valid date */
    date=mdy(2,29,year);
    if not missing(date) then
      output;
  end;
run;

Here's an excerpt of the result:

2000
...
2080
2084
2088
2092
2096
2104
2108
2112
2116

Notice how 2000 was included (divisible by 400), but 2100 is not? That's not a leap year, and SAS knows it. Did you?

See also

In the year 9999...: history of leap year and some software bugs

Post a Comment

Using PROC IOMOPERATE to list and stop your SAS sessions

If you're a SAS administrator, you probably know that you can use SAS Management Console to view active SAS processes. These are the SAS sessions that have been spawned by clients such as SAS Enterprise Guide or SAS Add-In for Microsoft Office, or those running SAS stored processes. But did you know that you can generate a list of these processes with SAS code? It's possible with the IOMOPERATE procedure.

To use PROC IOMOPERATE, you need to know the connection information for the SAS Object Spawner: the host, port, and credentials that are valid for connecting to the Object Spawner operator port. You plug this information into a URI scheme like the following:

iom://HOSTNAME:PORT;bridge;user=USERID,pass=PASSWORD

Here's an example:

iom://myserver.company.com:8581;bridge;user=sasadm@saspw,pass=Secret01

Are you squeamish about clear-text passwords? Good for you! You can also use PROC PWENCODE to obscure the password and replace its value in the URI, like this:

iom://myserver.company.com:8581;bridge;user=sasadm@saspw,
 pass={SAS002}BA7B9D0645FD56CB1E51982946B26573

Getting useful information from PROC IOMOPERATE is an iterative process. First, you use the LIST SPAWNED command to show all of the spawned SAS processes:

%let connection=
  'iom://myserver.company.com:8581;bridge;user=sasadm@saspw,pass=Secret01';
 
/* Get a list of processes */
proc iomoperate uri=&connection.;
    list spawned out=spawned;
quit;

Example output:
listspawned
You can retrieve more details about each process by running subsequent IOMOPERATE steps with the LIST ATTRS command. This can get tedious if you have a long list of spawned sessions. I've wrapped the whole shebang into a SAS program that discovers the processes and iterates through the list for you.

%let connection=
 'iom://myserver.company.com:8581;bridge;user=sasadm@saspw,pass=Secret01';
 
/* Get a list of processes */
proc iomoperate uri=&connection.;
    list spawned out=spawned;
quit;
 
/* Use DOSUBL to submit a PROC IOMOPERATE step for   */
/* each SAS process to get details                   */
/* Then use PROC TRANSPOSE to get a row-wise version */
data _null_;
    set spawned;
    /* number each output data set */
    /* for easier appending later  */
    /* TPIDS001, TPIDS002, etc.    */
    length y $ 3;
    y = put(_n_,z3.);
    x = dosubl("
    proc iomoperate uri=&connection. launched='" || serverid || "';
    list attrs cat='Information' out=pids" || y || ";
    quit;
    data pids" || y || ";
    set pids" || y || ";
    length sname $30;
    sname = substr(name,find(name,'.')+1);
    run;
 
    proc transpose data=work.pids" || y || "
    out=work.tpids" || y || "
    ;
    id sname;
    var value;
    run;
    ");
run;
 
/* Append all transposed details together */
data allpids;
    set tpids:;
    /* calculate a legit datetime value */
    length StartTime 8;
    format StartTime datetime20.;
    starttime = input(UpTime,anydtdtm19.);
run;
 
/* Clean up */
proc datasets lib=work nolist;
delete tpids:;
delete spawned;
quit;

The output details include "up time" (when the process was launched), the process ID (a.k.a. PID), the owner account, the SAS version, and more. Here's a snippet of some example output:
detailspawn

You can use this information to stop a process, if you want. That's right: from a SAS program, you can end any (or all) of the spawned SAS processes within your SAS environment. That's a handy addition to the SAS administrator toolbox, though it should be used carefully! If you stop a process that's in active use, an unsuspecting SAS Enterprise Guide user might lose work. And he won't thank you for that!

To end (kill) a SAS process, you need to reference it by its unique identifier. In this case, that's not the PID -- it's the UUID that the LIST ATTRS command provided. Here's an example of the STOP command:

/* To STOP a process */
    proc iomoperate uri=&connection.;                                  
        STOP spawned server 
             id="03401A2E-F686-43A4-8872-F3438D272973"; 
    quit;                                                             
/* ID = value is the UniqueIdentifier (UUID)      */
/*  Not the process ID (PID)                      */

It seemed to me that this entire process could be made easier with a SAS Enterprise Guide custom task, so I've built one! I'll share the details of that within my next blog post.

Post a Comment

Sorting data in SAS: can you skip it?

TL;DR

The next time that you find yourself writing a PROC SORT step, verify that you're working with the SAS Base engine and not a database. If your data is in a database, skip the SORT!

The details: When to skip the PROC SORT step

Many SAS procedures allow you to group multiple analyses into a single step through use of the BY statement. The BY statement groups your data records by each unique combination of your BY variables (yes, you can have more than one), and performs the PROC's work on each distinct group.

When using the SAS Base data engine (that's your SAS7BDAT files), BY-group processing requires data records to be sorted, or at least pre-grouped, according to the values of the BY variables. The reason for this is that the Base engine accesses data records sequentially. When a SAS procedure is performing a grouped analysis, it expects to encounter all records of a group in a contiguous sequence. What happens when records are out of order? You might see an error like this:

ERROR: Data set SASHELP.CLASS is not sorted in ascending sequence. 
The current BY group has Sex = M and the next BY group has Sex = F.

I first described this in 2010: Getting out of SORTs with SAS data.

In a recent post, Rick Wicklin discussed a trick you can use to tell SAS that your data are already grouped, but the group values might not be in a sorted order. The NOTSORTED option lets you avoid a SORT step when you can promise that SAS won't encounter different BY group values interleaved across the data records.

Sorting data is expensive. In data tables that have lots of records, the sort processing requires tremendous amounts of temporary disk space for its I/O operations -- and I/O usually the slowest part of any data processing. But here's an important fact for SAS programmers: a SORT step is required only for SAS data sets that you access using the Base engine*. If your data resides in database, you do not need to sort or group your data in order to use BY group processing. And if you do sort the data first (as many SAS programmers do, out of habit), you're wasting time.

I'm going to use a little SAS trick to illustrate this in a program. Imagine two copies of the ubiquitous CLASS data set, one in a Base library and one in a database library. In my example I'll use the SPDE engine as the database, even though it's not a separate database server. (Yes! You can do this too! SPDE is part of Base SAS.)

/* these 3 lines will create a temp space for your */
/* SPDE data */
/* See: http://blogs.sas.com/content/sasdummy/use-dlcreatedir-to-create-folders/ */
options dlcreatedir;
libname t "%sysfunc(getoption(WORK))/spde"; 
libname t clear;
 
/* assign an SPDE library. Works like a database! */
libname spde SPDE "%sysfunc(getoption(WORK))/spde";
/* copy a table to the new library */
data spde.class;
 set sashelp.class;
run;
 
/* THIS step produces an error, because CLASS */
/* is not sorted by SEX */
proc reg data=sashelp.class;
    by sex;
    model age=weight;
run;
quit;
 
/* THIS step works correctly.  An implicit          */
/* ORDER BY clause is pushed to the database engine */
proc reg data=spde.class;
    by sex;
    model age=weight;
run;
quit;

Why does the second PROC REG step succeed? It's because the requirement for sorted/grouped records is passed through to the database using an implicit ORDER BY clause. You don't see it happening in your SAS log, but it's happening under the covers. Most SAS procedures are optimized to push these commands to the database. Most databases don't really have the concept of sorted data records; they return records in whatever sequence you request. Returning sorted data from a database doesn't have the same performance implications as a SAS-based PROC SORT step.

How does SAS Enterprise Guide generate optimized code?

Do you use SAS Enterprise Guide tasks to build your analyses? If so, you might have noticed that the built-in tasks go to great lengths to guarantee that your task will encounter data that are properly sorted. Consider this setup for the Linear Regression task:
Linear Regression task
Here's an example of the craziness that you'll see from the Linear Regression task when you have a BY variable in a Base SAS data set. There is "defensive" keep-and-sort code in there, because we want the task to work properly for any data scenario.

/* -------------------------------------------------------------------
   Determine the data set's type attribute (if one is defined)
   and prepare it for addition to the data set/view which is
   generated in the following step.
   ------------------------------------------------------------------- */
DATA _NULL_;
 dsid = OPEN("SASHELP.CLASS", "I");
 dstype = ATTRC(DSID, "TYPE");
 IF TRIM(dstype) = " " THEN
  DO;
  CALL SYMPUT("_EG_DSTYPE_", "");
  CALL SYMPUT("_DSTYPE_VARS_", "");
  END;
 ELSE
  DO;
  CALL SYMPUT("_EG_DSTYPE_", "(TYPE=""" || TRIM(dstype) || """)");
  IF VARNUM(dsid, "_NAME_") NE 0 AND VARNUM(dsid, "_TYPE_") NE 0 THEN
   CALL SYMPUT("_DSTYPE_VARS_", "_TYPE_ _NAME_");
  ELSE IF VARNUM(dsid, "_TYPE_") NE 0 THEN
   CALL SYMPUT("_DSTYPE_VARS_", "_TYPE_");
  ELSE IF VARNUM(dsid, "_NAME_") NE 0 THEN
   CALL SYMPUT("_DSTYPE_VARS_", "_NAME_");
  ELSE
   CALL SYMPUT("_DSTYPE_VARS_", "");
  END;
 rc = CLOSE(dsid);
 STOP;
RUN;
 
/* -------------------------------------------------------------------
   Sort data set SASHELP.CLASS
   ------------------------------------------------------------------- */
PROC SORT
 DATA=SASHELP.CLASS(KEEP=Age Height Sex &_DSTYPE_VARS_)
 OUT=WORK.SORTTempTableSorted &_EG_DSTYPE_
 ;
 BY Sex;
RUN;
TITLE;
TITLE1 "Linear Regression Results";
PROC REG DATA=WORK.SORTTempTableSorted
  PLOTS(ONLY)=ALL
 ;
 BY Sex;
 Linear_Regression_Model: MODEL Age = Height
  /  SELECTION=NONE
 ;
RUN;
QUIT;

This verbose code drives experienced SAS programmers crazy. But unlike a SAS programmer, the SAS Enterprise Guide code generator does not understand all of the nuances of your data, and thus can't guess what steps can be skipped. (SAS macro programmers: you know what I'm talking about. Think about all of the scenarios you have to code for/defend against in a generalized macro.)

And when running the same task with a database table? SAS Enterprise Guide detects the table source is a database, and builds a much more concise version:

TITLE;
TITLE1 "Linear Regression Results";
PROC REG DATA=SPDE.CLASS
  PLOTS(ONLY)=ALL
 ;
 BY Sex;
 Linear_Regression_Model: MODEL Age = Weight
  /  SELECTION=NONE
 ;
RUN;
QUIT;

Consider the data source, and -- if you can -- skip the SORT!

* The Base engine is not the only sequential data engine in SAS, but it's the most common.

Post a Comment

A viral video that was 47 years in the making

American falls as seen from Canada in 2013

American side seen from Canada in 2013

When he filmed the scene in the summer of '69, my Dad did not foresee his moment of fame in 2016. But in the last two days, Dad has seen his 47-year-old work appear in the local Buffalo, NY media, on DailyMail.com, and on FOX News*.

In August of 1969, on a family outing to Niagara Falls, Dad filmed a remarkable scene. It was during the time that engineers had "turned off" the American side of the Falls**, diverting most of the water to the Canadian side, while scientists studied the natural wonder for erosion patterns. Did you know that it was possible to turn off the mighty Niagara Falls? Yes, it's been done. And there is renewed interest in the event because Niagara Falls authorities are talking about doing it again.

 
It might be generous to call the video "viral." By most definitions, a video can be called "viral" if it receives a million views in a day, or 3 to 5 million views in a few days. This video (on my personal YouTube channel) has received only about 40,000 views in the past day. Not viral, but let's call it "burgeoning" (thank you Roget's). (UPDATE: one week later, the video now has over 100,000 views!)

Here's the historical timeline of this video:

  • August 1969: Dad films the dewatered Niagara Falls on a common 8mm film camera. I'm in the video at the end -- that's me in the stroller (I was 1-yr old) with my Mom.
  • November 2006: as steward of the family 8mm films, I digitize the film and edit it. I added some explanatory text and a bed of music that I don't have the rights to use (hey, that makes me a citizen of the Internet).
  • April 2011: I upload the video to YouTube. I figured it would be interesting to some, as it captured a rare event. A once-in-a-lifetime event, we might have thought back then. In nearly 5 years, the video accumulates only a few thousand views.
  • January 2016: a perfect storm makes the video super popular. The conditions of this storm: a related modern story renews interest, the video contains relatively rare footage, and (maybe most importantly) the video producer (me) is available and responsive to grant permission to these media outlets.

That last point was probably crucial. In all three cases (Buffalo News, Daily Mail, and FOX News), the stories were produced within hours of the reporters reaching out to me. The stories were happening with or without my video. Like so many events in my life, this was all about being in the right place at the right time.

This blog topic is a departure from my usual discussion of SAS topics, so let's tie it back with a view of some YouTube stats. YouTube provides video analytics to any user with a YouTube channel, but the stats usually lag by several days. It's too soon to see the aggregated view of my stats that include the past two days. But, YouTube does offer a "real time" view of what is happening with your video right now. Here's my snapshot from this morning:
ytstats

If you watch the video in the next few days you'll be subjected to some advertising. That's how YouTube generates revenue from popular content. Thanks to my use of copyrighted music, I don't really have a chance to benefit financially from this sudden burst of activity. But that's okay with me -- I enjoy just watching the phenomenon to see how far it goes.


* FOX News reporters reached out to me yesterday and said the story would air yesterday afternoon. I haven't seen it, but my Dad confirmed they aired the video and gave him the photo credit.


** The idea of "turning off the Falls" sounds crazy to some, but really it's an impressive feat of engineering that was mastered decades ago. People are not generally aware that much of the "Falls" volume is diverted every day right now to provide hydroelectric power to the Northeast. Remember Y2K? When people were worried that the power grid might shut down when the year turned to 2000, one certainty remained: water would continue to flow over the Falls. The Niagara Falls hydro plant played a critical role in disaster preparations for Y2K. Of course, nothing came of it – Y2K was a big disappointment in that respect.

Post a Comment

Using the ODS statement to add layers in your ODS sandwich

The ODS statement controls most aspects of how SAS creates your output results. You use it to specify the destination type (HTML, PDF, RTF, EXCEL or something else), as well as the details of those destinations: file paths, appearance styles, graphics behaviors, and more. The most common use pattern is the "ODS sandwich." In this pattern, you open the destination with the ODS statement, then include all of the code that generates the substance of the output, and then use an ODS CLOSE statement to finish it off. Here's a classic example:

ods html file="c:\project\myout.html" /* top slice of bread */
  style=journal gpath="c:\project";
  proc means data=sashelp.class;      /* the "meat" */
  run;
 
  proc sgplot data=sashelp.class;
  histogram weight;
  run;
ods html close;                       /* bottom slice */

But did you know that you can insert more ODS statements to adjust ODS behavior midstream? These allow you to use a variety of ODS behaviors within a single result. You can create your own "Dagwood sandwich" version of SAS output! For cultural reference:

Dagwood sandwich: A Dagwood is a tall, multi-layered sandwich made with a variety of meats, cheeses, and condiments. It was named after Dagwood Bumstead, a central character in the comic strip Blondie, who is frequently illustrated making enormous sandwiches. Source: Wikipedia

Here's an example program that changes graph style and title behavior within a single ODS output file. You should be able to try this code in any SAS programming environment.

ods _all_ close;
%let outdir = %sysfunc(getoption(WORK));
ods graphics / width=400 height=400;
 
ods html(id=dagwood) file="&outdir./myout.html"
  style=journal gtitle  
  gpath="&outdir.";
 
  title "Example ODS Dagwood sandwich";
  proc means data=sashelp.class;
  run;
ods layout gridded columns=2;
ods region;
ods html(id=dagwood) style=statdoc ;  
  proc sgplot data=sashelp.class;
  title "This title is part of the graph image, Style=STATDOC";
  histogram weight;
  run;
ods region;
ods html(id=dagwood) style=raven nogtitle;
  title "This title is in the HTML, Style=RAVEN";
  proc sgplot data=sashelp.class;
  histogram height;
  run;
ods layout end;
 
ods html(id=dagwood) close;

odsexHere's the result, plus some important items to note about this technique.

  • It's a good practice to distinguish each ODS destination with an ID= value. This allows you to reference the intended ODS stream with no ambiguity. After all, you can have multiple ODS destinations open at once, even multiple destinations of the same type. In my example, I used ID=dagwood to make it obvious which destination the statement applies to.
  • You can use this technique to modify only those directives that can change "mid-file" to apply to different parts of the output. You can't modify those items that apply to the entire file, such as PATH, ENCODING, STYLESHEET and many more. These can be set just once when you create the file; setting multiple different values wouldn't make sense.

You can use this technique within those applications that generate ODS statements for you, such as SAS Enterprise Guide. For example, to modify the default SAS Enterprise Guide HTML output "midstream", add a statement like:

ods html(id=eghtml) /*... plus your options, like STYLE=*/ ;
ods html(eghtml) /* this shorthand works too */ ;

In SAS Studio or SAS University Edition, try this:
ods html5(id=web) /*... plus your options, like STYLE=*/ ;
ods html5(web) /* this shorthand works too */ ;

Example: an ODS Graphics style sampler

samplerHere's one more example that puts it all together. Have you ever wanted an easy way to check the appearance of the dozens of different built-in ODS styles? Here's a SAS macro program that you can run in SAS Enterprise Guide (with the HTML result on) that generates a "sampler" of graphs that show variations in fonts, colors, and symbols across the different styles.

This example uses ODS LAYOUT (production in SAS 9.4) to create a gridded layout of example plots. If you want to try this in SAS Studio or in SAS University Edition, you can adjust one line in the program (as noted in the code comments).

/* Run within SAS Enterprise Guide       */
/* with the HTML result option turned ON */
%macro styleSampler;
title;
proc sql noprint;
  select style into :style1-:style99  
    from sashelp.vstyle 
    where libname="SASHELP" and memname="TMPLMST";
 
  ods layout gridded columns=4;
  ods graphics / width=300 height=300;
  %do index=1 %to &sqlobs;
    ods region;
    ods html(eghtml) gtitle style=&&style&index.;
    /* In SAS Studio, use this instead: */
    /* ods html5(web) gtitle style=&&style&index.; */
    title "Style=&&style&index.";
    proc sgplot data=sashelp.class;
    scatter x=Height y=Weight /group=Sex;
    reg x=Age y=Weight / x2axis;
    run; 
  %end;
  ods layout end;
%mend;
 
%styleSampler;

See also

Take control of ODS results in SAS Enterprise Guide
Best way to suppress ODS output in SAS
Advanced ODS Graphics techniques: a new free book

Post a Comment