I've said it before: spreadsheets are not databases. However, many of us use spreadsheets as if they were databases, and then we struggle when the spreadsheet layout does not support database-style rigor of predictable rows, columns, and variable types -- the basic elements we need for analytics and reporting. If you're using SAS to read data from Microsoft Excel, what can you do when the data you need doesn't begin at cell A1?
By design, SAS can read data from any range of cells in your spreadsheet. In this article, I'll describe how to use the RANGE statement in PROC IMPORT to get the data you need.
With SAS 9.4 and later, SAS recommends using DBMS=XLSX for the most flexibility. It works on all operating systems without the need for additional components like the PC Files Server. Your Excel file must be in the Excel 2007-or-later format (XLSX). You do need a licence for SAS/ACCESS to PC Files. (Just learning? These DBMS=XLSX techniques also work in SAS University Edition.)
If your Excel data does not begin in cell A1 (the default start point for an import process), then you can add a RANGE= value that includes the specific cells. The easiest method is to use a Named Range in Excel to define the exact boundaries of the data.
How to add a Named Range
To define a named range in Excel, highlight the range of cells to include and simply type the new name of the range in the Name Box:
Then save the Excel file.
Then to import into SAS, specify that range name in the RANGE= option:
proc import datafile="/myprojects/myfile.xlsx" out=mydata replace; range="myspecialrange"; run;
Using Excel notation for a cell range
What if you don't know the range ahead of time? You can use PROC IMPORT to read the entire sheet, but the result will not have the column headers and types you want. Consider a sheet like this:
This code will read it:
proc import datafile="/myprojects/middle.xlsx" out=mid dbms=xlsx replace; run;
But the result will contain many empty cells, and the values will be read as all character types:
With additional coding, you can "fix" this result in another pass using DATA step. Or, if you're willing to add the RANGE option with the Excel notation for the specific cell ranges, you can read it properly in the first pass:
proc import datafile="/myprojects/middle.xlsx" out=mid dbms=xlsx replace; range="Sheet1$E7:K17" ; run;
How to "discover" the structure of your Excel file
You can also use LIBNAME XLSX to read entire sheets from Excel, or simply as a discovery step to see what sheets the Excel file contains before you run PROC IMPORT. However, LIBNAME XLSX does not show the Excel named ranges.
On SAS for Windows systems, you can use LIBNAME EXCEL (32-bit) or LIBNAME PCFILES (64-bit) to reveal a little more information about the Excel file.
libname d pcfiles path="c:\myprojects\middle.xlsx"; proc datasets lib=d; quit; /* always clear the libname, as it locks the file */ libname d clear;
Note that DBMS=XLSX does not support some of the options we see in the legacy DBMS=XLS (which supports only old-format XLS files), such as STARTROW and NAMEROW. DBMS=XLSX does support GETNAMES (treats the first record of the sheet or range as the variable names). See the full reference for Excel file import/export in the SAS documentation.