Accessing Google Cloud Storage (GCS) with SAS Viya

0
SAS loves data. It's our raison d'être. We've been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase "World Wide Web" combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O'Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To


The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, "How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?" is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil - the "indirect" way
  • REST API - the "web" way
  • gcsfuse - the "dark" way
  • BigQuery - the "smart" way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers' object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Share

About Author

Joe Furbee

Developer Advocate

As a developer advocate and community manager at SAS, Joe serves as a liaison between the developer community and SAS technologies. He oversees developer.sas.com, which provides resources for developers on SAS and open source, and writes blogs on programming and SAS administration tips. Recently, Joe was recognized by WhiteSource software in their list of Top 20 developer advocates to follow in 2020. Joe is passionate about chronicling his journey as he expands his own knowledge-base of SAS and open source integration.

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top