Moving SAS and its data to the cloud (AKA Viya): Making those tricky data decisions

0

In 2019, I penned the article We’ve all gone to cloud -- but what about my SAS data(base)? At the time, containerised Viya (4) wasn’t out yet and integration with Microsoft was in its early days, so let’s revisit some of those topics and see what’s changed. There is a lot of lower-level detail out there, but I've taken writing this article as an opportunity to look at the wider landscape and the direction of our customers, partners and SAS as we move SAS applications to be cloud-native in Viya 4.

Before I dive into the details, it's worth framing this conversation with the caveat that I’m mainly regarding the typical analytical platform data use cases in SAS, whether it's analytical data engineering (ETL) or analytics or visualisation. I’ll save the discussion on real-time and SOA-type applications of SAS for another time, as there’s a lot to be said there too. This article is about the target repository for storage. A follow-up article is planned on how we get the data up to the cloud as this has been a frequent question recently.

Let’s first look at cloud-native ‘Database’ Access/Connectivity. SAS continues to enhance its support for Cloud-native databases like Snowflake, Synapse and BigQuery. Updates include further pushdown of functions, a wider support for output data types, a wider range of performance options like MODE= and Specify Projects on BigQuery for example, and single sign-on for Snowflake in Azure. There is now extensive connectivity (and in-database) support for Spark, with support for Databricks on Azure, GCP and AWS as well as Microsoft’s Synapse.

It is also important to remember that we also support a wide range of traditional databases in their cloud-native form. For example, Google Cloud Platform Cloud SQL for SQL Server or Microsoft Azure SQL Server Big Data Clusters can be accessed via our SAS/ACCESS Interface to Microsoft SQL Server. This page provides a great summary of the wide range of over 40 cloud-native databases/stores 40+ we support here, and SAS documentation has a good summary listing to aid usage and setup.

Customers will be happy to know that SAS has added direct connectivity to the parquet file format in SAS Compute (v9 compatible engine), similar to the support for ORC. This support was originally file-based only but now includes object storage for both AWS and GCP with Azure to follow in 2023. Why is this important? First of all, parquet has become the default object-based cloud storage choice for customers in the UKI based on my experience. Parquet is open so all your SAS, Python, Spark and other language programmers can simply share data, with no conversion necessary. On the SAS front, this means you can take existing v9 programs and map them directly to data stored in object storage just by ‘changing’ the libnames mapping when you run them in Viya. Parquet tables have a columnar format and can often be compressed smaller than compressed SAS datasets; I’ve often seen them at 50% of the size. However, there are still some advantages to SAS datasets. Items like SAS indexes, integrity constraints, etc. aren’t supported, so parquet is not a direct replacement for SAS datasets. This documentation page details the current restrictions on parquet support. Some of these restrictions are driven by parquet not supporting those features, other features like partitions are on SAS’ roadmap for future implementation.

Object Storage on the cloud gives us another potential option. When we move a SAS Grid forward to Viya (this functionality is now called SAS Workload Management in Viya) as data stored in object storage, it is available on all nodes just like it would be on a clustered file system. Again, you need to look at the performance characteristics and your workload to see if it will meet your requirements. The balance might be there, particularly if a lot of steps are more transitionary, i.e. you source the data from S3, some of your ETL builds into this shared area. Read this previous article on why I think the dynamics of the cloud change how you might design a SAS Grid when you move it to Viya.

A major improvement using Viya is the single native sign-on into the Azure database eco-system, enabled through our co-engineering work with Microsoft. If I’m an end user, there's no need for storing passwords when using FILENAMES to access ADLS, or connecting to databases like SQL server or Synapse. For more details, have a look at this documentation page. Although the most advanced Single sign-on features exist for SAS Viya on Microsoft Azure, SAS has added IAM integration points on AWS, for example with S3 and Redshift. Speaking to Product Management, the security integration improvements will continue further in AWS and GCP in the near future. Keep your eyes peeled for updates over this year!

One other key item to mention is SAS' recent work with SingleStore, which will give customers the option to add SingleStore as a dedicated analytics Database. Our customers get an independent data and analytics solution that works at scale and is completely portable including on-premise. The foundation for this is tight integration between SAS Viya and SingleStore, which includes items like the tiering of storage costs, security, streaming into the database, with instant analytics and interactions/transfers to SAS including items like the pushdown of SAS functionality into the database (specialist Visual Analytics interactions, for example). Because we OEM, there is a single stop for technical queries giving you accountability and access to world-class support. Evidently, we have customers deploying this and seeing considerable TCO reduction versus traditional bets in breed solutions.

As two final footnotes, not everyone will be moving to the cloud in the immediate future and SAS will continue its strong support for on-premise deployments as illustrated by the support for Kubernetes on-site and being able to use newer file formats like parquet in these deployments. Equally, if you’ve moved to the cloud but not all your SAS data has arrived SAS Cloud Data Exchange is now available again on the 2023.3 release of Viya with some net new functionality.

Learn more

Share

About Author

Paul Jones

Head of Technology SAS UK&I

Paul has championed the cause of data analytics and AI within enterprises across the UK and Ireland. Currently, Paul heads the Technology Practice for SAS and works closely with key customers across the region as well as supporting some EMEA-based customers. His current role is to help organizations face their AI and data challenges by adopting an enterprise wide analytical strategy to derive value within their data. Paul enjoys helping companies shape successful outcomes in complex projects.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top