The little SAS program’s official name was Extract_Transform_Load_ 0314.sas. But, that name was much too formal, way too long, and did not roll off of the tongue very easily at all. So, everybody simply called her: ETL Pi.
ETL Pi was conceived in a 2-hour project strategy meeting in conference room 427C on a Monday afternoon in early May. She was born three days later in Susan G.’s cubicle in the west wing of the sixth floor. Susan was a mid-level programmer/analyst with seven years of SAS programming experience. She first crafted ETL Pi to perform only the basic functions of extracting data from the Oracle database and loading it an Excel spreadsheet for use by the analysts. Susan wanted to make sure that she had the basic data transportation mechanisms—the Extract and Load parts of the ETL process—correctly programmed before moving on. So, ETL Pi began her existence as a compact 18-line SAS program that performed the bare minimum functions on test data.
After Susan was confident that she had the Extract and Load mechanisms correct, she added the lines of code that did the Transform part of the ETL process by creating constructed variables from those extracted from the Oracle database. She sent the resulting Excel files to the analysts for review and approval. The analysts had a lot of comments, ideas, proposals, recommendations, suggestions, and counter-suggestions; so Susan modified, changed, revised, amended, adjusted, reformed and altered ETL Pi accordingly until the analysts were finally satisfied. This resulted in ETL Pi growing to modest 57 lines of lean, efficient SAS code.
Susan productionized ETL Pi by first creating a directory for the Excel file that would be created each night and granting permissions to that directory to the analysts. Then, she modified ETL Pi so that it would be run on the UNIX server where the production Oracle database was housed. Susan did this by adding five SAS/Connect statements to ETL Pi specifying how to connect to the UNIX server and specifying which portions of the SAS code would be sent to execute on the server. The additions of the OPTIONS, SIGNON, RSUBMIT, ENDRSUBMIT, and SIGNOFF statements brought ETL Pi to 62 lines of SAS code.
Finally, Susan scheduled ETL Pi to run at 10:00pm every night from her PC workstation. To do this, she first created a .bat file that held the command to execute SAS and which also pointed at the full path to the ETL Pi SAS program. Next, she invoked the Windows Task Scheduler and went through the several manual point-and-click steps to schedule ETL Pi to run at 10:00pm every night. Satisfied with her preparations, Susan went home that evening anticipating a good first run and a clean SAS log.
ETL Pi was nervous as the clock slowly ticked away towards 10:00pm. She was used to working with small amounts of data and jostling for network bandwidth during the day. But, she didn’t know quite what to expect when running on the UNIX server where the big, important corporate programs ran. Tonight promised to be quite an adventure.
At 10:00pm sharp, the Windows Task Scheduler executed the .bat file. That invoked SAS in batch mode and started-up ETL Pi. Once initiated, ETL Pi executed an OPTIONS statement that contained two SAS/Connect-oriented options. COMMAMID=TCP specified for SAS to use the TCP communications access method, and REMOTE=UNIX427 specified that ETL Pi would be working on the UNIX427 server. Next, ETL Pi executed a SIGNON statement that pointed to a SAS/Connect script that actually initiated the connection between Susan’s PC and the UNIX server. The script executed and instantiated a SAS session on the UNIX server that ETL Pi could directly communicate with.
The next block of ETL Pi’s SAS code was sandwiched between RSUBMIT and ENDRSUBMIT SAS/Connect statements. So, that block of code would be submitted directly to the UNIX server and be processed there.
Oh, goodness; here I go! ETL Pi thought. Then,
Wheeeeeeeeeee!
…she exclaimed as she was whizzed at dizzying speeds across the network, through half a mile of fiber-optic cable, and into the SAS session executing on the UNIX427 server sitting in the data center.
ETL Pi emerged into a very crowded UNIX server. There were system backups, database updates, database backups, five java programs, and dozens of other SAS programs all jostling for computer resources. Not intimidated, ETL Pi bravely elbowed her way into the mix of programs, trying to get as much memory, disk space, and processor cores as she possibly could. I have a job to do, she thought. And I am going to do my best to get it done.
ETL Pi executed a LIBNAME statement that had the proper connection string information to access a production Oracle database using SAS/Access Interface to Oracle. Next she executed the SQLGENERATION=DBMS option in an OPTIONS statement. That option allows SAS to use in-database procedures which executes select SAS procedures within the Oracle database itself in order to take advantage of the database’s native processing speed and facilities.
Next, ETL Pi submitted a PROC SUMMARY to summarize specific data in one of the large tables in the Oracle database. Because of the SQLGENERATION option, that summary was efficiently done within the Oracle database and the small result set was returned to the SAS work library within ETL Pi’s UNIX SAS session.
That was all that ETL Pi needed to do on the UNIX server. She finished up by using PROC DOWNLOAD to download the summarized SAS data set in the work library on UNIX to the SAS work library on the PC.
Okay, I’m going home now, she thought as the ENDRSUBMIT statement executed.
Wheeeeeeeeeee!
…she shouted in exhilaration as control was passed back to the SAS session running back on the PC.
Once she was back home, ETL Pi executed two ODS statements, a DATA step, and the Report procedure that created the Excel spreadsheet for Susan’s analyst clients. Then, with a heavy sigh, she executed the SIGNOFF statement which terminated the UNIX SAS session and closed the connection with the server.
Just before ending, ETL Pi took a quick look at the SAS log. Not a WARNING or ERROR anywhere in the log, she beamed proudly. Susan is going to be very happy when she reviews it in the morning. Now that was fun and I can’t wait to do it again tomorrow night!
And, with that, the little SAS program’s big night out came to an end.
References
SAS 9.4 Programming Documentation Help Center
SAS Connect
Report Procedure
SAS/ACCESS Interface to Oracle
SQLGENERATION system option
Output Delivery System (ODS)
2 Comments
May we please have a copy of the code? It sounds like an simply elegant solution for a common task.
THANKS!
Hi Michael, it's vividly story of a SAS program. Thanks for sharing.