Jedi SAS Tricks: Transwarp Processing with DS2 in SAS® Viya®

4
Mastering the SAS® DS2 Procedure book cover image
Mastering the SAS® DS2 Procedure

When speed is required at scale, nothing beats parallel processing your data in-memory. And IMHO, when parallel processing your data in memory, it's hard to beat DS2 and SAS® Viya® with the amazing Cloud Analytic Services (CAS)! While working on the second edition of "Mastering the SAS DS2 Procedure", I gained access to a small SAS Viya installation and was able to do some testing. My test case was scoring a large Teradata table using a DS2 data step. The table has over 106,000 rows and is quite wide. Running the scoring code in 4 parallel threads on SAS 9.4M5 on Windows 10 64 bit with 4 cpus took about 4 minutes.

I wondered how much faster it would run in-database on Teradata.  After all, the Teradata instance was pretty beefy, with 576 amps total available for processing. I executed the same program with in-database processing enabled, and the the process took only 17 seconds! Of course, the next thing I wanted to try was CAS...

The SAS Viya installation to which I had access was using CAS running distributed on two Linux x64 nodes. For CAS processing, I wanted to be sure I accounted for the time required to front load the data from Teradata to CAS, so I could make an apples-to-apples comparison. Loading the data took about 1 second! I'd love to have data pipes like that in MY house. Running the DS2 scoring process on CAS took an additional 6 seconds to complete,  for a grand total of 7 seconds. That's speed I can respect!

Now, CAS has DS2 action sets baked in, and you could learn CASL and run your DS2 that way. But it's really quite simple to run your DS2 process in CAS using our old, familiar PROC DS2.  First, we'll need to connect to CAS from our SAS 9.4 session. We'll set some system options and use a CAS statement to fire up a named CAS session we can use for further processing:

OPTIONS CASHOST="mycas.mydomain.com" CASPORT=1234;
cas mycas sessopts=(caslib=casuser timeout=1800 locale="en_US");

Now our SAS 9.4 session is aware of our CAS session named 'mycas'. Loading data to CAS is pretty easy, too.  There's a PROC for that - PROC CASUTIL. The process looks something like this:

proc casutil;
   load 
   data=db_data.campaign 
   outcaslib="casuser"
   casout="campaign" promote;
run;

To run a DS2 process in CAS, make sure the resources you need are pre-loaded to the CAS session, and invoke PROC DS2 with the sessref= option:

proc ds2 sessref=mycas;

That's all there is to it! If your DS2 programs read and write to CAS tables, the code will execute distributed on CAS without any further effort on your part.
Parallel processing data in memory? It's hard to beat DS2 and SAS® Viya® with the amazing Cloud Analytic Services (CAS)! Click To Tweet

This is just one of the many new things that convinced me it was time to write a second edition of "Mastering the SAS DS2 Procedure: Advanced Data Wrangling Techniques".

Other new topics include:

  • more object oriented programming information showcasing the new PRIVATE methods and variables available in packages
  • the new and blazing fast PCRX packages which put the "express" in "regular expressions"
  • a section on the new LIBS= option that takes the sting out of connection strings
  • the new PROC DSTODS2 procedure that makes converting existing DATA step code to DS2 so much easier

and a whole lot more.

I'll soon be off to SAS Global Forum 2018 in Denver. If you're attending, come hear me present my paper "Working with Big Data in SAS" on Monday at 4:30. Or look me up in the Quad bookstore Monday evening from 5:30 to 6:30, or Tuesday from 9:00 to 9:30 / 12:30 to 13:00. Bring or buy a copy of my DS2 book, and I'll gladly sign it for you! Don't forget to say "Howdy" if we bump into each other in the Quad, Hands-on Workshops or wandering the halls. I love to meet new people and gab about SAS!

Perhaps you missed SAS Global Forum but would like to have seen me present the "Working with Big Data in SAS" paper? No worries - you can view this recording - it's just like being there :-)

Until next time, may the SAS be with you!
Mark

Share

About Author

SAS Jedi

Principal Technical Training Consultant

Mark Jordan (a.k.a. SAS Jedi) grew up in northeast Brazil as the son of Baptist missionaries. After 20 years as a US Navy submariner pursuing his passion for programming as a hobby, in 1994 he retired, turned his hobby into a dream job, and has been a SAS programmer ever since. Mark writes and teaches a broad spectrum of SAS programming classes, and his book, "Mastering the SAS® DS2 Procedure: Advanced Data Wrangling Techniques" is in its second edition. When he isn’t writing, teaching, or posting “Jedi SAS Tricks”, Mark enjoys playing with his grand and great-grandchildren, hanging out at the beach, and reading science fiction novels. His secret obsession is flying toys – kites, rockets, drones – and though he usually tries to convince Lori that they are for the grandkids, she isn't buying it. Mark lives in historic Williamsburg, VA with his wife, Lori, and Stella, their cat. To connect with Mark, check out his SAS Press Author page, follow him on Twitter @SASJedi or connect on Facebook or LinkedIn.

Related Posts

4 Comments

Back to Top