Raiders of the lost spreadsheet

0

Have you ever peered intently into an unfamiliar data delivery directory, realized what was in it, rolled over onto your side, stared blankly into the distance, and dejectedly uttered something akin to:

"Spreadsheets! Why did it have to be spreadsheets?"

If so, then we are definitely on the same page. Why does it always have to be spreadsheets?

The answer to that question is actually pretty obvious when you think about it. The popularity of Microsoft Office has made Excel one of the most popular mediums for storing data. It is used extensively in grade schools, middle schools, high schools, and colleges. People with home businesses use it; office administrators use it; clerical staff use it; scientists use it; lawyers use it; hospital workers use it; Federal, state and local government workers use it; and programmers use it too.

An individual who needs to store data in electronic format and then process it may not have SAS, or C++, or JAVA, or C#, or PYTHON, or PHP, or R, or MATLAB, or ColdFusion, or FOCUS, or FORTRAN, or Groovy, or JavaScript, or MOBY, or MUMPS, or NATURAL, or Perl, or PHP, or PL/SQL, or PowerShell, or Python, or S-PLUS, or Visual Basic installed on his or her PC. But, that person will undoubtedly have Microsoft Office and thus have Excel. That is why it always has to be spreadsheets.

But, processing data stored in spreadsheets is not really a problem for intrepid SAS programmers. When I go on a data exploration expedition where there is a good chance of encountering spreadsheets, I pack the usual: my brown leather jacket, fedora, and bullwhip. But, most importantly, I put SAS/Access Interface to PC Files into my backpack.

SAS/Access Interface to PC Files is a SAS for Windows product that allows you to read, write, and update data in Excel and Access. As such, it is a must-have for your Windows SAS installation.

Here is an example of a program that I use to map out the contents of an unexplored spreadsheet:

ods rtf file="G:\BigProject\Worksheets in NewDataSpreadsheet.rtf";
 
libname xlslib "G:\NewProject\DeliveryDirectory\NewDataSpreadheet.xlsx" access=readonly;
 
proc sql;
create table WorkSheets as
select distinct(compress(MEMNAME,"',$")) as WorkSheet_Name,
name as ColumnName
from dictionary.columns
where libname = 'XLSLIB';
quit;
 
proc print noobs data=WorkSheets;
var WorkSheet_Name ColumnName;
title1 "Workseets in NewDataSpreadsheet.xlsx";
run;
 
ods rtf close;

The ODS statement specifies that my report will be created as an RTF document. Because I have SAS/Access Interface to PC Files, the LIBNAME statement allocates the NewDataSpreadheet.xlsx spreadsheet much the same way as it would for a SAS data set. (Notice that I specified access=readonly so that I do not accidentally update the spreadsheet). Since I have "LIBNAME-d" the spreadsheet, information about its worksheets and column names is now available in the SAS Dictionary Tables.

I use PROC SQL to extract the name of each worksheet (variableWorkSheet_Name) in the Excel file; and the names of the columns (variableColumnName) within each worksheet and then plop them into a SAS data set for further exploration. The code snippet compress(MEMNAME,"',$") gets rid of the annoying quotes and dollar signs that are found in spreadsheet MEMNAMEs. Then, I use the PRINT procedure to create a report. A simple, neat, quick, and easily macro-tized piece of code.

Here are several good references that you can use to find out more about processing spreadsheets with SAS:

Armed with those resources, some pluck, a sense of adventure, and with your own trusty copy of SAS/Access Interface to PC Files, you too can be a raider of the lost spreadsheet!

Best of luck in all your SAS endeavors!

Share

About Author

Michael A. Raithel

Senior systems analyst for Westat and SAS Press author

Michael A. Raithel is a senior systems analyst for Westat, an employee-owned contract research organization in the Washington, DC area. An internationally recognized expert in the use of SAS software in mainframe and UNIX environments, he is the author of over 25 SAS technical papers and is a popular lecturer at SAS Global Forum and at regional SAS conferences. He has written four books for SAS; the most recent book is How to Become a Top SAS Programmer. A copy of the first edition Tuning SAS Applications in the MVS Environment, resides in the Smithsonian Institution of American History’s Permanent Research Collection of Information Technology.

Comments are closed.

Back to Top