Using LIBNAME XLSX to read and write Excel files

When you weren't watching, SAS did it again. We smuggled Yet Another Excel Engine into a SAS release.

SAS 9.4 Maintenance 2 added the XLSX engine, which allows you to read and write Microsoft Excel files as if they were data sets in a library. The big advantage of using this engine is that it accesses the XLSX file directly, and doesn't use the Microsoft data APIs as a go-between. (LIBNAME EXCEL and LIBNAME PCFILES rely on those Microsoft components.) That means that you can use this engine on Windows or Unix systems without having to worry about bitness (32-bit versus 64-bit) or setting up a separate PC Files Server process.

The XLSX engine does require a license for SAS/ACCESS to PC Files. Are you a SAS University Edition user? The SAS/ACCESS product is part of that package, so this technique works there. It's an easy way to get well-formed Excel data into your SAS process.

/* because Excel field names often have spaces */
options validvarname=any;
 
libname xl XLSX '/folders/myfolders/sas_tech_talks_15.xlsx';
 
/* discover member (DATA) names */
proc datasets lib=xl; quit;
 
libname xl CLEAR;

Example output:

xl_contents
Once the library is assigned, I can read the contents of a spreadsheet into a new SAS data set:

/* because Excel field names often have spaces */
options validvarname=any;
 
libname xl XLSX '/folders/myfolders/sas_tech_talks_15.xlsx';
 
/* read in one of the tables */
data confirmed;
  set xl.confirmed;
run;
 
libname xl CLEAR;

And here's the result in my SAS University Edition:

xl_confirmed
Sometimes you need just one value from a spreadsheet. That's a common use case for dynamic data exchange (DDE), which isn't as feasible as it once was. You can use FIRSTOBS and OBS options to control how much data you retain:

/* read in just one value */
data _null_;
  set xl.confirmed (firstobs=6 obs=6 keep='Job title'n);
  call symput('VALUE','Job Title'n);
run;
%put &value;

Output:

 76         %put &value;
 Testing Manager,  Quality-driven User Experience Testing

You can also use the XLSX engine to create and update XLSX files.

libname xlout XLSX '/folders/myfolders/samples.xlsx';
 
data xlout.cars;
  set sashelp.cars;
run;
 
data xlout.classfit;
  set sashelp.classfit;
run;
 
data xlout.baseball;
  set sashelp.baseball;
run;
 
data xlout.air;
  set sashelp.air;
run;
 
libname xlout clear;

Here is my output in Microsoft Excel with all of these data sets now as sheets:

xl_xlsxout
Remember, you can also create Microsoft Excel files with Base SAS by using ODS EXCEL -- experimental in 9.4 Maintenance 2 but production in Maintenance 3 (coming soon).

The XLSX libname is different from the EXCEL and PCFILES engines in other ways. For example, the XLSX engine does not support Excel named ranges (which can surface a portion of a spreadsheet as a discrete table). Also, you won't see the familiar "$" decoration around the spreadsheet names when they are surfaced in the library within SAS. If you need that sort of flexibility, you can use PROC IMPORT to provide more control over exactly what Excel content is brought into SAS and how.

One other IMPORTANT caution: The XLSX engine is a sequential access engine in that it processes data one record after the other. The engine starts at the beginning of the file and continues in sequence to the end of the file. Some techniques to MODIFY the data in-place will not work. Also, some SAS data viewers cannot render the data from the XLSX engine. SAS VIEWTABLE and SAS Enterprise Guide and even SAS Studio can't open these tables directly in the data grid view. VIEWTABLE gives you a nice message, but SAS Enterprise Guide simply "hangs" in the attempt. For that reason, I recommend using DATA step to copy the Excel content that you want to another SAS library, then CLEAR the XLSX library to avoid accidentally opening a table in a viewer that won't support it. (This is currently a bug in SAS Enterprise Guide that should be fixed in a future release.)

I have found LIBNAME XLSX to be a quick, convenient method to bring in Excel data on any SAS platform. If you have SAS 9.4 Maintenance 2 or later, try it out! Let me know how it works for you by sharing a comment here.

Post a Comment

Filter your SAS Enterprise Guide data with a WHERE clause

Are you a VIEWTABLE fan from the SAS Windowing Environment (a.k.a. Display Manager, DMS, PC SAS)? If so, the latest version of SAS Enterprise Guide has a new feature that you'll love.

With the latest update to SAS Enterprise Guide 7.1 (7.11), you can now subset your data in the data grid by typing a WHERE clause filter. The new Where button appears at the top of the data grid.

Simply click the button, and type a filter expression into the text field. Hit the Enter key to apply the filter:

where_ex1
Just as you might expect from SAS, you can build more complex expressions too. For example, you can transform the variable that you're comparing to lower case so that case-insensitive filters are easier to build:

where_ex2
And if you have continuous variables such as a date variable, you can subset by using BETWEEN to specify the boundaries you want to see:

where_ex3
Or transform that continuous range to a category (such as a YEAR) with a SAS function, on the fly:

where_ex4

SAS Enterprise Guide has always allowed you to subset data easily, but only by adding a step in the process flow (either a Query Builder task or Filter and Sort task). Now, you can accomplish this "in place", saving time and storage as you explore. Let us know what you think of the new feature -- either here in the blog comments or within the SAS Enterprise Guide community.

Post a Comment

Using FILENAME ZIP to unzip and read data files in SAS

I've written about how to use the FILENAME ZIP method to read and update ZIP files in your SAS programs. The ZIP method was added in SAS 9.4, and its advantage is that you can accomplish more in SAS without having to launch external utilities such as WinZip, gunzip, or 7-Zip.

Several readers replied with questions about how you can use the content of these ZIP files within your SAS program. The basic scenario is: "I've got some data files in my ZIP archive. I want to use SAS to unzip these and then use them as data within my SAS process. Can I do this?"

Yes, you can -- but it does require an extra step. Even though FILENAME ZIP can show you the contents and structure of your ZIP file, most SAS procedures cannot access the content directly while it's in the archive. So, the additional step is to copy the file to another location, effectively extracting it from the ZIP file.

As an example, I created a ZIP file with two files and a subfolder:

data.zip
  |__ sas_tech_talks_15.xlsx
  |__ sas/
      |__ instanttitles.sas7bdat

This SAS program helps me to discover how FILENAME ZIP sees the file:

filename inzip ZIP "c:\projects\data.zip";
 
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
  memname=dread(fid,i);
  /* check for trailing / in folder name */
  isFolder = (first(reverse(trim(memname)))='/');
  output;
 end;
 rc=dclose(fid);
run;
 
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

Output:

        Files in the ZIP file                                         
 memname                       isFolder
 sas/                             1  
 sas/instanttitles.sas7bdat       0  
 sas_tech_talks_15.xlsx           0  
                N = 3

With this information, I can now "copy" the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the "member" syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sas_tech_talks_15.xlsx) 
       lrecl=256 recfm=F length=length eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc import datafile=xl dbms=xlsx out=confirmed replace;
  sheet=confirmed;
run;

Sample output from my SAS log:

NOTE: The infile INZIP(sas_tech_talks_15.xlsx) is:
      Filename=c:\projects\data.zip,
      Member Name=sas_tech_talks_15.xlsx

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
      Filename=C:\SAS Temporary Files\_TD396_\Prc2\sas_tech_talks_15.xlsx,
      RECFM=N,LRECL=256,File Size (bytes)=0,
      Last Modified=11May2015:11:38:59,
      Create Time=11May2015:11:20:23

NOTE: A total of 55 records were read from the infile library INZIP.
NOTE: 55 records were read from the infile INZIP(sas_tech_talks_15.xlsx).
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

To use the SAS data set in the file, I need to copy it into a location shared by a SAS library. In this example, I will again use the WORK location. Because my SAS data set is in a logical subfolder (named "sas") within the archive, I need to include that path as part of the member syntax on the INFILE statement.

/* Copy a zipped data set into the WORK library */
filename ds "%sysfunc(getoption(work))/instanttitles.sas7bdat" ;
 
data _null_;
   /* reference the member name WITH folder path */
   infile inzip(sas/instanttitles.sas7bdat) 
	  lrecl=256 recfm=F length=length eof=eof unbuf;
   file   ds lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc contents data=work.instanttitles;
run;

Partial output in my example:

                             Files in the ZIP file                          
                             The CONTENTS Procedure

 Data Set Name        WORK.INSTANTTITLES            Observations          1475
 Member Type          DATA                          Variables             6   
 Engine               V9                            Indexes               0   
 Created              01/29/2015 15:09:54           Observation Length    248 
 Last Modified        01/29/2015 15:09:54           Deleted Observations  0   
 Protection                                         Compressed            NO  
 Data Set Type                                      Sorted                NO  
 Label                                                                        
 Data Representation  WINDOWS_64                                              
 Encoding             wlatin1  Western (Windows)                              

Of course, all of this can be automated even further by writing SAS code that automatically iterates through the ZIP file member names and copies/imports each of the members as needed.

Post a Comment

How to convert a Unix datetime to a SAS datetime

timeI watched with wonder as each of my daughters learned how to "tell time." Early in their primary school careers, they brought home worksheets that featured clock faces with big-hand/little-hand configurations that they had to decipher, and exercises that asked them to draw the hands as they should appear given a specific elapsed time. Now my daughters have digital watches, and have even adopted my habit of wearing them with the face on the underside of the wrist and setting the display to 24-hour time -- which serves to confound their friends.

But, to my disappointment, their school has failed to teach my daughters how to read a SAS datetime value, or even the more ubiquitous Unix (POSIX) datetime value. As a parent of the 21st century, I feel it is my duty to fill this gap in their education.

First, let's review how these datetime values are represented:

  • A SAS datetime value is the number of seconds that have elapsed since midnight of January 1, 1960 (01JAN1960:00:00:00).
  • A Unix (or POSIX) datetime value is the number of seconds* that have elapsed since midnight of January 1, 1970 (01JAN1970:00:00:00).

Conversion from Unix to SAS representation is simple math:

/* Number of seconds between 01JAN1960 and 01JAN1970: 315619200 */
sasDT = unixDT + 315619200;

Having trouble remembering that 9-digit constant? Then you can get SAS to infer that part for you and use the dhms() function:

/* DHMS function calculates datetime when you provide values for */
/*    date, hour, minute, and seconds                            */
/* In this case, "seconds" is a very high value!                 */
sasDT = dhms('01jan1970'd, 0, 0, unixDT);

Raw Unix times are often expressed as UTC, and you might prefer to show local times in your SAS reporting. That's a simple calculation with the (undocumented) gmtoff() function:

/* Convert from UTC to local time using GMTOFF */
sasDT = dhms('01jan1970'd,0,0, unixDT + gmtoff());

Update: Bruno told me that TZONEOFF is an official SAS function in SAS 9.4! See the comments for more details how you can use it instead of GMTOFF.

I have one more variation that I use every day. I have a project that reports against a database that stores transaction times in milliseconds instead of seconds. (This uses the database's intrinsic int8 or BIGINT type.) SAS datetime values can represent fractions of seconds, so this conversion simply requires that I divide by 1000. I can use a different SAS datetime format to see the precision (if that's what I want, though it's usually not).

/* for more precision */
format sasDT datetime22.3; 
/* mlliseconds from transaction database */
unixDTms = 1429193343362;
sasDT_ms = dhms('01jan1970'd,0,0, unixDTms/1000);
/* result: 16APR2015:14:09:03.362 */

Here's a complete program that you can experiment with:

data t;
  format sasDT_const 
    sasDT_dhms 
    sasDT_local datetime20.;
 
  /* for more precision */
  format sasDT_ms datetime22.3;
 
  /* Unix test datetime of 16APR2015:14:09:03 UTC */
  unixDT =   1429193343;
 
  /* mlliseconds from transaction database */
  unixDTms = 1429193343362;
 
  /* Number of seconds between 01JAN1960 and 01JAN1970: 315619200 */
  sasDT_const = unixDT + 315619200;
 
  /* DHMS function calculates datetime given values for */
  /*    date, hour, minute, and seconds                 */
  /* In this case, "seconds" is a very high value!      */
  sasDT_dhms = dhms('01jan1970'd,0,0,unixDT);
 
  /* converting a value from milliseconds */
  sasDT_ms = dhms('01jan1970'd,0,0,unixDTms/1000);
 
  /* Convert from UTC to local time using GMTOFF */
  /* use TZONEOFF function in 9.4! */
  sasDT_local = dhms('01jan1970'd,0,0,unixDT + gmtoff()); 
run;

Sample output from this program (which I ran from my EDT timezone):
time_out
* I intentionally avoided any discussion of leap seconds that go along with UTC, but you can learn more about that here.

See also

Read a Microsoft datetime value into a SAS datetime value

Post a Comment

How to compare SAS programs in SAS Enterprise Guide

Copy/paste is my favorite method for creating new SAS programs. In my work projects, I maintain a sort of genealogy of SAS programs, because the DNA of one program can be used to spawn many other SAS programs as its progeny. When things (inevitably) aren't working as I intend in one of these next-generation programs, I find that I need to compare that program against the original to see what I've missed.

My file-comparison tool of choice is WinMerge. It's free and easy to install and use. That's why I'm really pleased with the new Compare Programs feature in SAS Enterprise Guide 7.1.

Here is how you set it up.

1. Install a file comparison tool.

I like WinMerge. But some of my colleagues prefer Beyond Compare or KDiff. Your choice! Most of these tools can integrate with your file system so that you can compare any two files from Windows Explorer. They also integrate with source control systems, if you use those (Git, Subversion, etc.).

2. Set your File Comparison options in SAS Enterprise Guide

Select Tools->Options, then the File Comparison tab. Check "Use custom file comparison tool" and then select your tool of choice. There are built-in presets for the three comparison tools that I mentioned, or you can go "off road" and select your favorite with "Other".

Select your DIFF tool

3. Select the two files that you want to compare

In the Project Tree or Process Flow, you can select multiple programs by using the standard Ctrl+Click sequence or by "rubber banding" a selection with the mouse (Process Flow only). When you have exactly two programs selected, right-click and select Compare.

Compare programs menu
The comparison tool launches with the contents of each program loaded for comparison. Here's what mine looks like:

Code compare in action
Some additional notes:

  • The content you see in the comparison tool is a copy of your program. Therefore, if you use the comparison tool to make changes, these won't be reflected in your SAS Enterprise Guide project. Of course, you can copy the changed content to the Windows clipboard and paste it into the SAS Enterprise Guide program editor.
  • You can do more with file comparison when using the Program History feature, which I introduced in this blog post about new programming features in SAS Enterprise Guide 7.1.
Post a Comment

Project Reviewer task and Copy Files task: refreshed for v7.1

Two popular SAS custom tasks have recently been updated for SAS Enterprise Guide 7.1. Most custom tasks that I've shared will work without modification across releases, but these two required a special rebuild due to some internal product API changes.

The Project Reviewer task allows you to see a detail view of the tasks/programs within your SAS Enterprise Guide process flows. The listing shows the task names and types, when they were added to the project and last modified, and how much time they take to run. You can also create a SAS-based ODS report from the data, which is a great way to use SAS Enterprise Guide to report on your SAS Enterprise Guide project. (Wow, that's so meta!)

projectrev71
Download Project Reviewer: See this blog post for more information and download link

The Copy Files task is used to move files from your local PC to your SAS session, or from your SAS session to your local PC. This allows you to capture these file transfer steps within your process flow so that they are repeatable. The task supports the use of macro variables and wildcards, which gives you lots of flexibility when designing a process in which the file names and quantities are subject to change.

Copy Files task
Download Copy Files: See this blog post for more information and download link

If you use these tasks (or other custom tasks) and find them useful, please leave a comment! I love to hear how the tasks are being used. And sometimes useful custom tasks find their way into the actual product in a future release. Your feedback is important to help to make that happen.

Other related articles

Post a Comment

Can you learn how to create custom tasks?

To develop a custom task for use in SAS Enterprise Guide (or SAS Add-In for Microsoft Office), you need a few things:

  • A working knowledge of SAS and SAS Enterprise Guide
  • Microsoft Visual Studio (the free Express edition will do, though the Professional edition is better)
  • Some experience with C# or Visual Basic -- or at least with an object-oriented framework such as Java or C++.

It helps to have a "how to" guide, and I've written one of those. And if you're the sort of person who learns best in the classroom, you can now take a class.

Are you wondering whether the class right for you? Watch this video (about 4 minutes long, in which I do my best Troy McClure impersonation). I'll explain what custom tasks are and what you need to know in order to build them. I'll also explain what you can expect to learn from the class, and the skills you should have before you enroll.


To enroll:
Visit the support.sas.com SAS Training site and find an offering that works with your schedule and location.

Post a Comment

How to split one data set into many

splitdataBack in the day when the prison system forced inmates to perform "hard labor", folks would say (of someone in prison): "He's busy making little ones out of big ones." This evokes the cliché image of inmates who are chained together, forced to swing a chisel to break large rocks into smaller rocks. (Yes, it seems like a pointless chore. Here's a Johnny Cash/Tony Orlando collaboration that sets it to music.)

SAS programmers are often asked to break large data sets into smaller ones. Conventional wisdom says that this is also a pointless chore, since you can usually achieve what you want (that is, process a certain subset of data) by applying a WHERE= option or FIRSTOBS=/OBS= combination. Splitting a data set creates more files, which occupy more disk space and forces more I/O operations. I/O and disk access is often the most expensive part of your SAS processing, performance-wise.

But if the boss asks for broken-up data sets, you might as well spend the least possible effort on the task. Let's suppose that you need to break up a single data set into many based on the value of one of the data columns. For example, if you need to break SASHELP.CARS into different tables based on the value of Origin, the SAS program would look like:

DATA out_Asia;
 set sashelp.cars(where=(origin='Asia'));
run;
DATA out_Europe;
 set sashelp.cars(where=(origin='Europe'));
run;
DATA out_USA;
 set sashelp.cars(where=(origin='USA'));
run;

I'm going to admit right now that this isn't the most efficient or elegant method, but it's something that most beginning SAS programmers could easily come up with.

Writing the above program is easy, especially since there are only 3 different values for Origin and I've memorized their values. But if there are more discrete values for the "split-by" column, the task could involve much more typing and has a high possibility for error. This is when I usually use PROC SQL to generate the code for me.

If you've read my article about implementing BY processing for an entire SAS program, you know that you can use PROC SQL and SELECT INTO to place data values from a data set into a macro variable. For example, consider this simple program:

proc sql;
 select distinct ORIGIN into :valList separated by ',' from SASHELP.CARS;
quit;

It creates a macro variable VALLIST that contains the comma-separated list: "Asia,Europe,USA".

But we can use SAS functions to embellish that output, and create additional code statements that weave the data values into SAS program logic. For example, we can use the CAT function to combine the values that we query from the data set with SAS keywords. The results are complete program statements, which can then be referenced/executed in a SAS macro program. I'll share my final program, and then I'll break it down a little bit for you. Here it is:

/* define which libname.member table, and by which column */
%let TABLE=sashelp.cars;
%let COLUMN=origin;
 
proc sql noprint;
/* build a mini program for each value */
/* create a table with valid chars from data value */
select distinct 
   cat("DATA out_",compress(&COLUMN.,,'kad'),
   "; set &TABLE.(where=(&COLUMN.='", &COLUMN.,
   "')); run;") into :allsteps separated by ';' 
  from &TABLE.;
quit;
 
/* macro that includes the program we just generated */
%macro runSteps;
 &allsteps.;
%mend;
 
/* and...run the macro when ready */
%runSteps;

Here are the highlights from the PROC SQL portion of the program:

  • SELECT DISTINCT ensures that the results include just one record for each unique value of the variable.
  • The CAT function concatenates a set of string values together. Note that CATX and CATS and CATT -- other variations of this function -- will trim out white space from the various string elements. In this case I want to keep any blank characters that occur in the data values because we're using those values in an equality check.
  • The program calculates a name for each output data set by using each data value as a suffix ("OUT_dataValue"). SAS data set names can contain only numbers and letters, so I use the COMPRESS function to purge any invalid characters from the data set name. The 'kad' options on COMPRESS tell it to keep only alpha and digit characters.
  • The resulting program statements all end up in the &ALLSTEPS macro variable. I could just reference the &ALLSTEPS variable in the body of the SAS program, and SAS would run it as-is. Instead I chose to wrap it in the macro %runSteps. This makes it a little bit easier to control the scope and placement of the executable SAS program statements.

"By each value of a variable" is just one criterion that you might use for splitting a data set. I've seen cases where people want to split the data based on other rules, such as:

  • Quantity of observations (split a 3-million-record table into 3 1-million-record tables)
  • Rank or percentiles (based on some measure, put the top 20% in its own data set)
  • Time span (break up a data set by year or month, assuming the data records contain a date or datetime variable)

With a small modification, my example program can be adapted to serve any of these purposes. What about you? Are you ever asked to split up SAS data sets, and if so, based on what criteria? Leave a comment and tell us about it.

Post a Comment

SAS employees are now easier to spot on Communities

Since the launch of Communities on SAS, hundreds of SAS employees have been among you. Some SAS employees made themselves known by selecting a telling user name (such as Cynthia@SAS), but others remained camouflaged or incognito, keeping their secret identities like the SAS superheroes they are.

That's about to change. This week, we are revealing all SAS staffers on the Communities by changing their status indicators to "SAS Employee", featuring the familiar SAS logo. We're making this change to help make Communities on SAS more valuable and relevant to you. We know that you enjoy interacting with SAS employees here; now you'll have a clearer view into when that's happening.

You'll see this take effect in several places. For example, for the communities that have a "Top Participants" roll, you might see some SAS folks appear in the list:

sasemp_statusroll
When you hover your mouse cursor over a user name, or drill down to see more details about a community member, you will also see the "SAS Employee" designation.

sasemp_user
With the introduction of the "SAS Employee" status, you will no longer see the familiar "rank" status for these users. So you won't be able to tell -- at a glance -- whether a SAS Employee has achieved Expert, Master, or merely Novice levels of contribution. We hope that you'll appreciate all SAS employees as experts in their respective areas, even if the ranking isn't spelled out for you.

We hope that you enjoy the change. If you have feedback or ideas, please let us know!

Post a Comment

Using SAS Enterprise Guide with your local SAS installation

serverlist_withlocalIf you have SAS Enterprise Guide and SAS for Windows installed on a machine together, they should Just Work. There is no special setup required. But...what if they don't?

I've posted an article in the SAS Enterprise Guide community about this topic. Read the article to learn:

  • How to select "No profile" mode in SAS Enterprise Guide.
  • How to add your Local SAS into the mix when you're connected to a SAS Metadata environment
  • How to "fix" your local system when SAS Enterprise Guide can't find your local SAS installation.

See the full article >> Using SAS Enterprise Guide with your local SAS installation

If you find the article to be helpful, please give it a rating. (You will need to sign in with your SAS profile to add a rating.) Positive ratings will make it easier for other users to find the information. And if you think the article needs something else to make it more useful, leave a comment on the article or here on the blog.

Side note about Communities on SAS

I've always been a big fan of the Communities on SAS, where thousands of SAS practitioners interact to learn from each other. A good number of SAS employees also participate in the discussions, which means that many of the answers about SAS technology are coming directly from the developers who created it. But it's your participation that makes the communities rich and engaging.

And now I have a more professional interest in the communities, as I've just moved into the role of managing our SAS Online Communities program. My team of community managers would love to hear from you about what you like and what you think could be improved. We have big plans for the communities program, and your input is essential to help us build the space where you want to spend your time. Post back in the comments here, or send us your feedback using this form (select Discussion Forums or Communities as the category).

Post a Comment