Tell SAS to read a database field as CHAR instead of INT or BIGINT

Yesterday a frustrated SAS user complained on Twitter. He's working with a database that stores an ID field as a big long number (perhaps using the database BIGINT type), and SAS can't display a number greater than 15 digits. Well, it's actually 16 digits, depending on the value:

%put Biggest Exact Int = %sysfunc(constant(EXACTINT,8));
>> Biggest Exact Int = 9007199254740992

It's a controversial design decision to use an integer to represent an ID value in a database. You might save a few bytes of storage, but it limits your ability to write programs (not just SAS programs) that have to store and manipulate that value. And if you don't need to do math operations with the ID, your data consumers would rather see a nice character value there.

Fortunately, when working with databases, you can tell SAS to read numeric values as character values into your SAS data sets. In addition to solving the precision problem I've just described, this can also help when you need to join database fields with other source systems that store their key fields differently. It's usually much easier to convert the field "on the way in" rather than try to mangle it after you've already read in the records. Use the DBSASTYPE= data set option to tell SAS how to read database fields. Here's a sample SAS program that shows how I access a table using ODBC, one step without and one step with the DBSASTYPE= option.

libname wpblogs odbc datasrc="wpblogs";
options obs=10;
data users_IDint (keep=ID display_name);
  set wpblogs.wp_users;
data users_IDchar (keep=ID display_name);
  set wpblogs.wp_users 

Here are the resulting tables; you can see the simple difference. One has ID as a number, and one has it as a character. Magic!

The DBSASTYPE= option is supported for virtually all SAS/ACCESS database engines, including the ubiquitous SAS/ACCESS to ODBC.

Oh, and you might be wondering how things turned out for our frustrated user on Twitter. Our SAS Cares social media team heard his plea and responded -- as they always do. And our user not only found the information useful, he took it a step further by replying back with an additional syntax tip.

Post a Comment

Copy McCopyface and the new naming revolution

As a parent of children who love books, I can tell you that there is something humorous about taking a first name, adding a "Mc" and then a rhyming surname to make up a brand new character name. My daughters always loved to read the adventures of Harry Mclary from Donaldson's Dairy, and we loved to read it aloud to them. It was just fun.

The Boaty McBoatface phenomenon has taken this to the next level by adding "face" as a suffix, which often has a funny punctuating effect ("silly face", "Chu chi face", "doody face," etc. Hilarious!).

I thought that I was done writing blogs about Boaty McBoatface, but I've been hearing from so many people about this topic that I need at least this one more to finish it off.

Name our ship: final results

Spoiler: NERC is not going to christen the new vessel "Boaty McBoatface." Instead the name comes from the 4th-highest vote-getter, "David Attenbourough." The famous explorer earned over 11,000 votes, or 2.78% of all votes cast. However, as a crowd-pleasing nod to the plebians, NERC will name one of the ship's remotely operated submarines "Boaty McBoatface." Hooray! I grabbed the final voting results from the NameOurShip website and re-ran my analysis. Here's the final top 10 standings.

I also ran a three-panel visualization, using the method that Rick Wicklin shared, so you can see how unevenly distributed the votes remained at the end.


Many imitators, but original stays on top

The original entry of "Boaty McBoatface" inspired many copycats who submitted names with a similar formula. None of them seemed to have the wide appeal of Boaty, probably because they weren't first and original, but here they are with their vote counts.

I found these in the data with a simple SQL LIKE operator, finding those names that had the pattern "blank-y Mc-blank-y".

proc sql;
   create table work.TheMCs as 
   select t1.title, 
            (sum(t1.likes)) format=comma20. as totalVotes
      from work.votes t1
      where t1.title like '%y Mc%'
      group by t1.title
      order by totalVotes desc;

Boaty Mac: start of a popular movement

Silly names are not limited to research vessels. The world has embraced the Boaty McBoatface pattern. A colleague sent me news about Parsey McParseface, an open-source project from Google. Grumpy McNoisybutt was proposed as a name for a rattlesnake. Even my own daughter has created Rocky McRockface, a major character in her rock cycle project.

I won't say that this is my final Boaty post. Who knows? In a couple of years I might be reporting on "Boaty McBoatface"-inspired baby names. I'm confident that at least one poor child will bear the name; that's the sort of world we live in. Fortunately, children usually find a way to have revenge on their parents (which is why I have nothing but praise for Rocky McRockface).

Post a Comment

Ad-hoc reporting with SAS: Tips for the e-mail jockey

What's the most common data reporting mechanism? Is it web-based reporting? PDFs? How about spreadsheets? Maybe, but in my experience many reports are delivered using a less-scalable and transient mechanism: e-mail.

I'm a data steward at SAS. Specifically, I look after the operational data around our blogging program and our online communities. Even though we have many self-serve reports already set up for these programs, I'm often approached with specific one-off questions. When I get asked the same question multiple times, I'll create a report that runs automatically and stays current for future requests. But for most jobs I find myself running queries and pasting the result into e-mail. Just because the request is ad-hoc doesn't mean it has to take a lot of time or appear as "quick and dirty." Here's my process for creating solid e-mail responses that please my stakeholders.

Step 1: Select data values, then Copy with Headers

This is one of my favorite new features in SAS Enterprise Guide 7.1. I use this technique several times per week. It's a big time saver because it grabs the selected data values and their column names. It places them onto the Windows clipboard as tab-delimited data.

If you don't have version 7.1 then you can approximate this technique by selecting Send To->Microsoft Excel. This launches Microsoft Excel, opens a new sheet and populates it with all of the data in the data grid, including the headers. You can then copy your selection from the sheet and continue.

Step 2. Paste selection into new e-mail message

When you paste tab-delimited data into Microsoft Outlook, you get a raggedy-looking set of lines. But don't worry -- Step 3 will take care of that. Other e-mail programs might actually create a table for you automatically.


Step 3. Apply "Convert text to table" action

In Microsoft Outlook, this is an option on the Message menu, under the Table pulldown. It converts your selected text into a true table, offering you options to confirm the number of columns and rows. Or use this shortcut: simply select Insert Table with your data selected, and Outlook creates the table.


Optional Step 4: Beautify

Microsoft Outlook offers a number of canned crowd-pleasing table layouts that will format your headings, rows and columns so they look fancy yet readable. Pick your favorite and apply.

You can see me stumble through all of the steps in this animation:

The result is an attractive table that answers a question. It looks like something that might have taken you hours to prepare. There is one danger though: because you accomplished the task so quickly, your constituents might feel emboldened to ask more difficult questions, and with greater frequency.

Post a Comment

How to run SAS programs in Jupyter Notebook

We've just celebrated Earth Day, but I'm here to talk about Jupyter -- and the SAS open source project that opens the door for more learning. With this new project on the page, SAS contributes new support for running SAS from within Jupyter Notebooks -- a popular browser-based environment used by professors and data scientists.

My colleague Amy Peters announced this during a SAS Tech Talk show at SAS Global Forum 2016. If you want to learn more about Jupyter and see the SAS support in action, then you can watch the video here.

Visit the project on GitHub: sas_kernel by sassoftware

Within Jupyter, the sas_kernel provides multiple ways to access SAS programming methods. The most natural method is to create a new SAS notebook, available from the New menu in the Jupyter Home window and from the File menu in an active notebook:

From a SAS notebook, you can enter and run SAS code directly from a cell:

There is even a Notebook extension (./nbextensions/showSASLog) that can show you the SAS log.

The second way that you can run SAS code is by using special Jupyter "magics" supported by the sas_kernel. These magic commands look almost just like SAS macro calls (imagine that!). From within a Python language notebook, you can inject your SAS program code and pull in SAS results. This allows you to move easily between Python and SAS in a single environment. Here's a simple example:

proc means;
ods graphics / height=500 width=800;
proc sgplot;
histogram msrp;

How to get started

Update 25July2016: Support for Jupyter is now part of the SAS University Edition! See the SAS University Edition documentation for information about how to use it.

In SAS environments outside of SAS University Edition, here's what you need to run SAS with Jupyter:

  • SAS 9.4 or later running on Linux
  • Python 3 installed on the same machine (that's basically part of Linux)
  • Access to the Linux shell to install/configure the Jupyter Notebook infrastructure and the sas_kernel.

End users of Jupyter Notebook do not need special privileges. And you can access it from a browser on any system: Windows, Mac, Linux...whatever. In the SAS Tech Talk video with Amy, we were running on my Windows laptop using Chrome, connecting to a Linux instance of Jupyter and SAS. The GitHub project has all of the doc and step-by-step instructions for installation.

What's next for SAS and Jupyter?

This is just the start for SAS in the Jupyter world. Amy says that she has already received lots of interest and feedback, and SAS is working to make the Jupyter Notebook approach available in something like SAS University Edition and SAS OnDemand for Academics. Stay tuned!

Post a Comment

Boaty McBoatface is on the run

I know what you're thinking: two "Boaty McBoatface" articles within two weeks? And we're past April Fool's Day?

But since I posted my original analysis about the "Name our ship" phenomenon that's happening in the UK right now, a new contender has appeared: Poppy-Mai.

The cause of Poppy-Mai, a critically ill infant who has captured the imagination of many British citizens (and indeed, of the world), has made a very large dent in the lead that Boaty McBoatface holds.

Yes, "Boaty" still has a-better-than 4:1 lead. But that's a lot closer than the 10:1 lead (over "Henry Worsley") from just over a week ago. Check out the box plot now: you can actually make out a few more dots. Voting is open for another 10 days -- and as we have seen, a lot can happen in that time.

As I take this second look at the submissions (now almost 6300) and voting data (almost 350,000 votes cast), I've found a few more entries that made me chuckle. Some of them struck me by their word play, and others cater to my nerdy sensibilities. Here they are (capitalization retained):

While I'm on this topic, I want to give a shout-out to regex101, the online regular expression tester. I was able to develop and test my regular expressions before dropping them into a PRXPARSE function call. I found that I had to adjust my regular expression to cast a wider net for valid titles from the names submissions data. Previously, I wasn't capturing all of the punctuation. While that's probably because I didn't expect punctuation to be part of a ship's name, that assumption doesn't stop people from suggesting and voting on such names. My new regex match:

  title_regex = prxparse("/\'title\':\s?\""([a-zA-Z0-9\'\.\-\_\#\s\$\%\&\(\)\@\!]+)/");

I could probably optimize by specifying an exception pattern instead of an inclusion pattern...but this isn't the sort of project where I worry about that.

Will I write about Boaty McBoatface again? What will my next Boaty article reveal? Stay tuned!

Post a Comment

And it's Boaty McBoatface by an order of magnitude

In a voting contest, is it possible for a huge population to get behind a ridiculous candidate with such force that no other contestant can possibly catch up? The answer is: Yes.

Just ask the folks at NERC, the environmental research organization in the UK. They are commissioning a new vessel for polar research, and they decided to crowdsource the naming process. Anyone in the world is welcome to visit their NameOurShip web site and suggest a name or vote on an existing name submission.

As of today, the leading name is "RRS Boaty McBoatface." ("RRS" is standard prefix for a Royal Research Ship.) This wonderfully creative name is winning the race by more than just a little bit: it has 10 times the number of votes as the next highest vote getter, "RRS Henry Worsley".

I wondered whether the raw data for this poll might be available, and I was pleased to find it embedded in the web page that shows the current entries. The raw data is in JSON format, embedded in the source of the HTML page. I saved the web page source to my local machine, copied out just the JSON line with the submissions data, then used SAS to parse the results. Here's my code:

filename records "c:\projects\votedata.txt";
data votes (keep=title likes);
 length likes 8;
 format likes comma20.;
 label likes="Votes";
 length len 8;
 infile records;
  if _n_ = 1 then
      retain likes_regex title_regex;
      likes_regex = prxparse("/\'likes\'\:\s?([0-9]*)/");
      title_regex = prxparse("/\'title\':\s?\""([a-zA-Z0-9\'\s]+)/");
 position = prxmatch(likes_regex,_infile_);
  if (position ^= 0) then
      call prxposn(likes_regex, 1, start, len);
      likes = substr(_infile_,start,len);
 start=0; len=0;
 position = prxmatch(title_regex,_infile_);
  if (position ^= 0) then
      call prxposn(title_regex, 1, start, len);
      title = substr(_infile_,start,len);

With the data in SAS, I used PROC FREQ to show the current tally:

title "Vote tally for NERC's Name Our Ship campaign";
proc freq data=votes order=freq;
table title;
weight likes;

The numbers are compelling: good ol' Boaty Mac has over 42% of the nearly 200,000 votes. The arguably more-respectable "Henry Worsley" entry is tracking at just 4%. I'm not an expert on polling and sample sizes, but even I can tell that Boaty McBoatface is going to be tough to beat.

To drive the point home a bit more, let's look at a box plot of the votes distribution.

title "Distribution of votes for ALL submissions";
proc sgplot data=votes;
hbox likes;
xaxis valueattrs=(size=12pt);

In this output, we have a clear outlier:
If we exclude Boaty, then it shows a slightly closer race among the other runners up (which include some good serious entries, plus some whimsical entries, such as "Boatimus Prime"):

title "Distribution of votes for ALL submissions except Boaty McBoatface";
proc sgplot data=votes(where=(title^="Boaty McBoatface"));
hbox likes;
xaxis valueattrs=(size=12pt);

See the difference between the automatic axis values between the two graphs? The tick marks show 80,000 vs. 8,000 as the top values.

Digging further, I wondered whether there were some recurring themes in the entries. I decided to calculate word frequencies using a technique I found on our SAS Support Communities (thanks to Cynthia Zender for sharing):

/* Tally the words across all submissions */
data wdcount(keep=word);
    set votes;
    i = 1;
    origword = scan(title,i);
    word = compress(lowcase(origword),'?');
    wordord = i;
    do until (origword = ' ');
        /* exclude the most common words */
        if word not in ('a','the','of','and') then output;
        i + 1;
        wordord = i;
        origword = scan(title,i);
        word = compress(lowcase(origword),'?');
proc sql;
   create table work.wordcounts as 
   select t1.word, 
          /* count_of_word */
            (count(t1.word)) as word_count
      from work.wdcount t1
      group by t1.word
      order by word_count desc;
title "Frequently occurring words in boat name submissions";
proc print data=wordcounts(obs=25);

The top words evoke the northern, cold nature of the boat's mission. Here are the top 25 words and their counts:

  1    polar         352 
  2    ice           193 
  3    explorer      110 
  4    arctic         86 
  5    red            69 
  6    sir            55 
  7    john           54 
  8    lady           46 
  9    sea            42 
 10    ocean          42 
 11    scott          41 
 12    bear           39 
 13    aurora         38 
 14    artic          37 
 15    queen          37 
 16    captain        36 
 17    james          36 
 18    endeavour      35 
 19    william        35 
 20    star           34 
 21    spirit         34 
 22    new            26 
 23    antarctic      26 
 24    boat           25 
 25    cold           25

I don't know when voting closes, so maybe whimsy will yet be outvoted by a more serious entry. Or maybe NERC will exercise their right to "take this under advisement" and set a certain standard for the finalist names. Whatever the outcome, I'm sure we haven't heard the last of Boaty...

Post a Comment

Add files to a ZIP archive with FILENAME ZIP

In previous articles, I've shared tips about how you can work with SAS and ZIP files without requiring an external tool like WinZip, gzip, or 7-Zip. I've covered:

But a customer approached me the other day with one scenario I missed: how to add SAS data sets to an existing ZIP file. It's a variation of a tip that I've already shared, but with two differences. First, in order to add a data set to a ZIP file, you have to know its physical filename -- not just the LIBNAME.MEMBER reference that you use in SAS procedure steps. And second, I had not shown how to add a new file to an existing ZIP archive -- though it turns out that's pretty simple.

Find the file name for a SAS data set

There are several ways to do this. For my approach, I used the output from PROC CONTENTS. Notice that I had to capture the ODS output (not the OUT= data set) to grab the file name. I wrapped it in a macro for easy reuse. And since I ultimately need a SAS fileref to map to the path, I've assigned one (data_fn) in my macro.

/* macro to assign a fileref to a SAS data set in a Base library */
%macro assignFilerefToDataset(_dataset_name);
    %local outDsName;
    ods output EngineHost=File;
    proc contents data=&_dataset_name.;
    proc sql noprint;
        select cValue1 into: outDsName 
            from work.file where Label1="Filename";
    filename data_fn "&outDsName.";

How to add a new member to a ZIP file

Now that I have the source file, I need to designate a destination file in a ZIP archive. The FILENAME ZIP method will create a new ZIP file if one does not yet exist, or it can add to an existing ZIP. To ensure I'm starting from scratch, I assign a simple fileref to my target destination and then delete the file.

/* Assign the fileref - basic file method */
filename projzip "&projectDir./";
/* Start with a clean slate - delete ZIP if it exists */
data _null_;

To create a new ZIP file and designate a path and file name within it, I used the FILENAME ZIP method with the MEMBER= option. Note that I specified the "data/" subfolder in the MEMBER= value; this will place the file into a named subfolder within the archive.

/* Use FILENAME ZIP to add a new member -- CLASS */
/* Put it in the data subfolder */
filename addfile zip "&projectDir./" 

Then finally, I need to actually "copy" the file into the archive. I do this by streaming the source file into the target fileref byte-by-byte:

/* byte-by-byte copy */
/* "copies" the new file into the ZIP archive */
data _null_;
    infile data_fn recfm=n;
    file addfile recfm=n;
    input byte $char1. @;
    put  byte $char1. @;
filename addfile clear;

That's it! I now have a ZIP file with one member entry. Now I can "press repeat" to add a second entry:

/* Use FILENAME ZIP to add a new member -- CARS */
/* Put it in the data subfolder */
filename addfile zip "&projectDir./" 
/* byte-by-byte copy */
/* "copies" the new file into the ZIP archive */
data _null_;
    infile data_fn recfm=n;
    file addfile recfm=n;
    input byte $char1. @;
    put  byte $char1. @;
filename addfile clear;

Optional: Report on the ZIP file contents

If I want to report on the total contents of the ZIP file now, here's a DATA step and PROC CONTENTS step that does the job:

/* OPTIONAL for reporting */
/* Report on the contents of the ZIP file */
/* Assign a fileref wth the ZIP method */
filename inzip zip "&projectDir./";
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname);
    length memname $200;
    if fid=0 then
    do i=1 to memcount;
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;


Files in the ZIP file 

N = 2

I hope that this helps to make the FILENAME ZIP method more useful to those who want to try it out. I'm sure that there will be more scenarios that people will ask about; someday, if I write enough blog posts, I'll have it all covered!

Sample program: You can view/download the entire SAS program (containing the snippets I've featured and more) from my GitHub profile.

Post a Comment

The zoomiest new feature in SAS Enterprise Guide 7.12

Have you ever been in a meeting in which a presenter is showing content on a web page -- but the audience can't read it because it's too small? Then a guy sitting in the back of the room yells, "Control plus!". Because, as we all know (right?), "Ctrl+" is the universal key combination that zooms your browser content.

I'm that guy -- the one who is shouting out the key combos. Every time. And as it turns out, we don't all know about this handy way to magnify the content on a page. And do you know what else works? Holding down the Ctrl key while sliding the mouse wheel. That trick also works in Microsoft Office products like Excel, Word, and even Outlook e-mail.

With the latest release of SAS Enterprise Guide (version 7.12, released last week), you can now use these ubiquitous magical key combinations to zoom your SAS content: HTML results, process flow, data records, and even your SAS program code. In my lucky position as a SAS insider, I've been using this for months and it's absolutely my favorite new thing. Have a lengthy SAS program? Here's a fun thing to do: zoom the program editor out to 10% to see the shape of your code. Then compare with those from your friends and draw sweeping conclusions about each other, Rorschach-test style.

Here's a few animated screen shots from SAS Enterprise Guide 7.12, showing the Ctrl+ zooming in action for code, data, and HTML results:

Zoom in close to your code

Zoom in close to your code

See your data near and far

See your data near and far

Close in on your HTML results

Close in on your HTML results

Post a Comment

A custom task to list and stop your SAS sessions

Last week I described how to use PROC IOMOPERATE to list the active SAS sessions that have been spawned in your SAS environment. I promised that I would share a custom task that simplifies the technique. Today I'm sharing that task with you.

How to get the SAS Spawned Processes task

You can download the task from this SAS communities topic, where I included it as an attachment. The instructions for installation are standard for any custom task; the details are included in the README file that is part of the task package.

You can also view and pull the source code for the task from my GitHub repository. I built it using Microsoft .NET and C#.

How to use the SAS Spawned Processes task

Once you have the task installed, you can access it from the Tools->Add-In menu in SAS Enterprise Guide. (By the way, the task should also work in the SAS Add-In for Microsoft Office -- though the installation instructions are a little different.)

The task works by using PROC IOMOPERATE to connect to the SAS Object Spawner. You'll need to provide the connection information (host and port) plus the user/password for an account that has the appropriate permissions (usually a SAS admin account). Note that the port value is that of the Object Spawner operator port (by default, 8581) and not the SAS Metadata Server.

The task shows a list of active SAS processes. Of course, you're using a SAS process to even run the task, so your active process is shown with a yellow highlight. You can select any of the processes in the list and select End Process to stop it. You can drill into more detail for any selected process with the Show Details button. Here's an example of more process details:

Did you try the task? How did it work for you? Let me know here or in the SAS communities.

Custom task features within this example

If you're professionally interested in how to build custom tasks, this example shows several techniques that implement common requirements. Use the source code as a reference to review how these are built (and of course you can always refer to my custom tasks book for more guidance).

  • Submit a SAS program in the background with the SasSubmitter class. There are two examples of this in the task. The first example is an asynchronous submit to get the list of processes, where control returns to the UI and you have the option to cancel if it takes too long. With an asynch submit, there are some slightly tricky threading maneuvers you need to complete to show the results in the task. The second example uses a synchronous submit (SubmitSasProgramAndWait) to stop a selected SAS process.
  • Read a SAS data set. The SAS program that retrieves a list of processes places that result in a SAS data set. This task uses the SAS OLE DB provider to open the data set and read the fields in each row, so it can populate the list view within the task.
  • Detect errors and show the SAS log. If the SAS programs used by the task generate any errors (for example, if you supply the wrong credentials), the task uses a simple control (SAS.Tasks.Toolkit.Controls.SASLogViewDialog) to show the SAS log -- color-coded so the error is easy to spot.
  • Retrieve the value of a SAS macro variable by using SasServer.GetSasMacroValue("SYSJOBID"). This pulls the process ID for your active SAS session, so I can compare it to those retrieved by PROC IOMOPERATE. That's how I know which list item to highlight in yellow.
  • Save and restore settings between uses. Entering credentials is a drag, so the task uses a helper class (SAS.Tasks.Toolkit.Helpers.TaskUserSettings) to save your host/port/user information to a local file in your Windows profile. When you use the task again, the saved values are placed into the fields for you. I don't save the password -- I'm sure that I'd get complaints if I did that, even if I encoded it.
Post a Comment

SAS knows it's a leap year. Do you?

Leap year questions come up all of the time in computing, but if there is any true season for it, it's now. The end of February is approaching and developers wonder: does my process know that it's a leap year, and will it behave properly?

People often ask how to use SAS to calculate the leap years. The complicated answer is:

  • Check whether the year is divisible by 4 (MOD function)
  • But add exceptions when divisible by 100
  • Yeah...except when it's also divisible by 400.

The simple answer is: ask SAS. You can create a SAS date value with the MDY function. Feb 29 is a valid date for leap years; in off years, MDY returns a missing value.

data leap_years(keep=year);
  length date 8;
  do year=2000 to 2200;
    /* MISSING when Feb 29 not a valid date */
    if not missing(date) then

Here's an excerpt of the result:


Notice how 2000 was included (divisible by 400), but 2100 is not? That's not a leap year, and SAS knows it. Did you?

See also

In the year 9999...: history of leap year and some software bugs

Post a Comment