When speed is required at scale, nothing beats parallel processing your data in-memory. And IMHO, when parallel processing your data in memory, it's hard to beat DS2 and SAS® Viya® with the amazing Cloud Analytic Services (CAS)! While working on the second edition of "Mastering the SAS DS2 Procedure", I gained access to a small SAS Viya installation and was able to do some testing. My test case was scoring a large Teradata table using a DS2 data step. The table has over 106,000 rows and is quite wide. Running the scoring code in 4 parallel threads on SAS 9.4M5 on Windows 10 64 bit with 4 cpus took about 4 minutes.
I wondered how much faster it would run in-database on Teradata. After all, the Teradata instance was pretty beefy, with 576 amps total available for processing. I executed the same program with in-database processing enabled, and the the process took only 17 seconds! Of course, the next thing I wanted to try was CAS...
The SAS Viya installation to which I had access was using CAS running distributed on two Linux x64 nodes. For CAS processing, I wanted to be sure I accounted for the time required to front load the data from Teradata to CAS, so I could make an apples-to-apples comparison. Loading the data took about 1 second! I'd love to have data pipes like that in MY house. Running the DS2 scoring process on CAS took an additional 6 seconds to complete, for a grand total of 7 seconds. That's speed I can respect!
Now, CAS has DS2 action sets baked in, and you could learn CASL and run your DS2 that way. But it's really quite simple to run your DS2 process in CAS using our old, familiar PROC DS2. First, we'll need to connect to CAS from our SAS 9.4 session. We'll set some system options and use a CAS statement to fire up a named CAS session we can use for further processing:
OPTIONS CASHOST="mycas.mydomain.com" CASPORT=1234; cas mycas sessopts=(caslib=casuser timeout=1800 locale="en_US");
Now our SAS 9.4 session is aware of our CAS session named 'mycas'. Loading data to CAS is pretty easy, too. There's a PROC for that - PROC CASUTIL. The process looks something like this:
proc casutil; load data=db_data.campaign outcaslib="casuser" casout="campaign" promote; run;
To run a DS2 process in CAS, make sure the resources you need are pre-loaded to the CAS session, and invoke PROC DS2 with the sessref= option:
proc ds2 sessref=mycas;
That's all there is to it! If your DS2 programs read and write to CAS tables, the code will execute distributed on CAS without any further effort on your part.
Parallel processing data in memory? It's hard to beat DS2 and SAS® Viya® with the amazing Cloud Analytic Services (CAS)! Click To Tweet
This is just one of the many new things that convinced me it was time to write a second edition of "Mastering the SAS DS2 Procedure: Advanced Data Wrangling Techniques".
Other new topics include:
- more object oriented programming information showcasing the new PRIVATE methods and variables available in packages
- the new and blazing fast PCRX packages which put the "express" in "regular expressions"
- a section on the new LIBS= option that takes the sting out of connection strings
- the new PROC DSTODS2 procedure that makes converting existing DATA step code to DS2 so much easier
and a whole lot more.
I'll soon be off to SAS Global Forum 2018 in Denver. If you're attending, come hear me present my paper "Working with Big Data in SAS" on Monday at 4:30. Or look me up in the Quad bookstore Monday evening from 5:30 to 6:30, or Tuesday from 9:00 to 9:30 / 12:30 to 13:00. Bring or buy a copy of my DS2 book, and I'll gladly sign it for you! Don't forget to say "Howdy" if we bump into each other in the Quad, Hands-on Workshops or wandering the halls. I love to meet new people and gab about SAS!
Until next time, may the SAS be with you!