If you have your SAS Certified Platform Administrator Credential, then it’s clear that you’ve studied a lot to achieve it. But suddenly the Hadoop era shows up and what you find are big gaps in your skills inventory.
SAS administrators must be familiar with all the data SAS platform can interact with, especially now that there is Hadoop. Hadoop is not just a database--it's a different platform and a new world when compared with the SAS platform—so you can’t reuse your “old” skills when working with SAS and Hadoop (at least not all of them!).
Here are the questions we must ask:
- Should SAS administrators enroll in a formal Hadoop certification training program?
- Who is responsible for filling the skills gap (and pay for it)? You? Your boss? Your organization? Your human resources department?
If you really need some education – you can always step forward and do it on your own. Here’s how:
- Start with YouTube tutorials on Hadoop foundations. Look into channels like Hortonworks or Edureka! I personally feel more comfortable getting introductory information from expert video because I prefer to see concepts presented visually.
- Go to support.sas.com and type “Hadoop” in the search field (top right corner). SAS has provided a lot of papers, articles, and blogs on this topic.
- Look for online or traditional Hadoop classes. The classroom gives you the chance to meet people studying Hadoop and the setting to exchange ideas with them.
- Get a Hadoop certification from Cloudera or Hortonworks.
- Go to one of the many Hadoop-related conferences such as Hadoop Summit or Strata.
Whether or not you’re going to be the next SAS/Hadoop guru, doesn’t mean you shouldn’t be familiar with terminology and SAS software that are part of the ecosystem.
Here’s a list of terms you should know:
HDFS: Hadoop distributed file system
Map Reduce: key component for processing on distributed system
Hive: SQL interface for Hadoop
Impala: new generation of interactive SQL interface for Hadoop
Yarn: Hadoop workload manager
HBase: non-relational database
Pig: language for simplifying MapReduce’s usage
Spark: general purpose framework for distributed computing
Cloudera: a Hadoop distribution
Hortonworks: another Hadoop distribution
Here’s a list of SAS software you should be familiar with as a SAS administrator:
- SAS\Access to Hadoop
- SAS\Access to Impala
- SAS Visual Analytics
- SAS Embedded Process
- SAS In-Database technology
- SAS Data Loader for Hadoop
Last, but not the least, the aspect of learning Hadoop is The Big Data Architecture Convergence. All the big data players are moving to architectures miming the multi-machine, multi-tenant, distributed and shared-nothing Hadoop architecture. A full understanding of the Hadoop architecture will help you to face the future challenges for the SAS platform administrator.
3 Comments
Andrea, its very informative..!! thanks for sharing this article.
Just adding to the list of SAS Software:
- Base SAS (FILENAME Hadoop, PROC HADOOP, SPDE hdfs)
- SAS SPD Server, hdfs support
- SAS Grid Manager with YARN
And perhaps not as an administrator, but anyway:
- SAS Data Integration Studio, Hadoop and LASR transformations
Thanks.