How to retrieve contents of a SAS® Scalable Performance Data Server library

4

SAS SPDS is lightning fastJust when you think you’ve seen it all, life can surprise you in a big way, making you wonder what else you've missed.

That is what happened when I recently had a chance to work with the SAS® Scalable Performance Data Server, a product that's been around a while, but I never crossed paths with before. I open an SPDS table of a hundred million records in SAS® Enterprise Guide, and I can scroll it as fast as if it were an Excel “baby” spreadsheet of a hundred rows. That’s how powerful it feels, to say nothing about the lighting speed of the queries.

What is the SAS Scalable Performance Data Server?

Also known as the SAS SPD Server (or SPDS), it's a data storage system designed for high-performance data delivery. Its primary purpose is to provide rapid queries of vast amounts of data. We are talking terabytes of data with tables containing billions of rows. SPDS employs parallel storage and efficient indexing, coupled with a multi-threaded server system concurrently processing tasks distributed across multiple processors.

Availability of the SPDS client in SAS® Viya effectively integrates SAS SPDS with SAS Viya, extending functionality of its applications beyond the native Cloud Analytic Services (CAS) where you can continue reaping all the benefits of the SAS SPDS.

SPDS library

In addition to connecting to SPD Server with explicit SQL pass-through, connection to SPD Server with a LIBNAME statement is available as well, for example:

libname mylibref sasspds 'serverdomain' host='nodename_or_ip' service='5400'
                         user='mySPDuserid' password='{SAS003}XXXXXXX...XXX';

This effectively creates an SPDS library, and the tables in that library can be referenced by two-level name mylibref.tablename as if this were a SAS BASE library.

Cluster tables vs. member tables

Besides ordinary data tables, SPDS library offers so called dynamic cluster tables – or clusters for short – enabling transparent access to large amounts of data.

Dynamic cluster tables (cluster tables or clusters) are virtual tables that allow users to access many server tables (member tables) as if they were one table. A dynamic cluster table is a collection of SPD Server tables that are presented to the end-user application as a single table through a metadata layer acting like a view.

Member tables can be added to the cluster as well as replaced and removed from the cluster.

The role of PROC SPDO

PROC SPDO is the SAS procedure for the SPD Server operator interface. It performs a wide range of SPD server, user and table management tasks:

  • create, list, modify, destroy, and undo dynamic cluster tables
  • add, remove, replace, and fix cluster table members
  • add, modify, list, and delete access control lists (ACLs) for server resources
  • define, describe, and remove WHERE constraints on tables for row-level security definition and management
  • issue system commands on server nodes

In addition to PROC SPDO, SPD Server plug-in for SAS® Management Console is also available.

Retrieving SPDS library contents

If you open an SPDS library in SAS Enterprise Guide, you won’t be able to tell which table in that library is a member table and which is a cluster table – they all look the same. But in many cases, we need to know what is what. Moreover, for data-driven processing we need to capture the SPDS library objects into a dataset and identify them whether they are clusters or member tables.

Luckily, PROC CONTENTS with OUT= option allows us to do just that. While MEMTYPE column is equal to ‘DATA’ for both, clusters and member tables, there is another, less known column inversely called TYPEMEM that has value of 'DATA' for clusters and blank value ' ' for member tables. The following simple code allows you to retrieve SPDS library objects list into WORK.SPDSTYPES dataset where TABLETYPE column specifies whether it’s a cluster or a member for each library object MEMNAME:

proc contents data=SPDSLIB._all_ out=WORK.ALLOBJECTS (keep=MEMNAME TYPEMEM);
run;
 
proc sort data=WORK.ALLOBJECTS nodupkey;
   by MEMNAME;
run;
 
data WORK.SPDSTYPES;
   set WORK.ALLOBJECT;
   attrib TABLETYPE $7 label='SPDS table type';
   select(TYPEMEM);
      when('DATA') TABLETYPE = 'CLUSTER';
      when('')     TABLETYPE = 'MEMBER';
      otherwise    TABLETYPE = '';
   end;
run;

In this code PROC CONTENTS produces one record per column NAME in every object MEMNAME in the SPDS library; PROC SORT reduces (un-duplicates) this list to one record per MEMNAME; finally, data step creates TABLETYPE column indicating which MEMNAME is CLUSTER and which is MEMBER.

Keep in mind, when an SPDS member table is added to a cluster, it "disappears" from the SPDS library - you won't see it there. On the other hand, when you remove it from the cluster, it does not get deleted, but rather becomes visible SPDS member table again.

Retrieving SPDS cluster’s member list

In addition to retrieving a list of objects in the SPDS library described above, we also need a way of capturing the content (a list of members) of the cluster itself in order to control removing or replacing its members. PROC SPDO’s CLUSTER LIST statement produces such a list, and its OUT= option allows you to dump that list into a dataset:

proc spdo lib=SPDSLIB;
   cluster list CLUSTER1 out=CLUSTER1_MEMBERS;
   cluster list CLUSTER2 out=CLUSTER2_MEMBERS;
   /* ... */
   cluster list CLUSTERN out=CLUSTERN_MEMBERS;
quit;

This approach creates one output table per cluster, and you can’t use the same OUT= destination table for different clusters, for they will be overwritten with each subsequent CLUSTER LIST statement, not appended.

If you need to capture contents of several clusters into one dataset, then instead of the above method of outputting each cluster content into separate table and then appending (concatenating) them, the good old ODS OUTPUT with CLUSTERLIST= option allows us to do it in a single step:

ods noresults;
ods output clusterlist=WORK.CLUSTER_MEMS;
proc spdo lib=SPDSLIB;
   cluster list CLUSTER1;
   cluster list CLUSTER2;
   /* ... */
   cluster list CLUSTERN;
quit;
ods output close;
ods results;

As additional bonus ODS NORESULTS suppresses printed output when it’s not needed, e.g. for automatic data-driven processing.

Your thoughts?

What is your experience with SAS SPDS? How might you use it in the future? Please comment below.

Share

About Author

Leonid Batkhan

Leonid Batkhan, Ph.D. in Computer Science and Automatic Control Systems, has been a SAS user for more than 25 years. He came to work for SAS in 1995 and is currently a Senior Consultant with the SAS Federal Data Management and Business Intelligence Practice. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

4 Comments

  1. Nice writeup and intro to the product!

    Since it focus on cluster I just wish to add the important feature MINMAXVARLIST, which will let you select a single member based on your where clause with close to zero maintenance penalty.

    I'm also wondering about the product strategy in relationship to Viya. At least for SPDE, SAS has declared that SPDE access is only for legacy purposes, nothing they want Viya users start to use.

    • Leonid Batkhan

      Thank you, Linus, for your comment and adding the MINMAXVARLIST. Our readers can find this SPDS table option description here.

      In regards, to SAS Viya, as I mentioned in the 4th paragraph "Availability of the SPDS client in SAS® Viya effectively integrates SAS SPDS with SAS Viya, extending functionality of its applications beyond the native Cloud Analytic Services (CAS) where you can continue reaping all the benefits of the SAS SPDS."

      The more comprehensive quote from What’s New in SPD Server in SAS Viya and SPD Server 5.4 reads: "Beginning with SAS Viya 3.4, SPD Server client software is included in SAS Viya. SPD Server 5.41 client software is included in SAS Viya to enable you to access your existing server version 5.4 and earlier server versions in SAS Viya. SPD Server 5.41 client software has enhancements to SPD Server macro variables. There are documentation enhancements to SAS Scalable Performance Data Server: User’s Guide."

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top