Hadoop skills for SAS administrators – why you need them and where to start

3

ProblemSolversIf you have your SAS Certified Platform Administrator Credential, then it’s clear that you’ve studied a lot to achieve it. But suddenly the Hadoop era shows up and what you find are big gaps in your skills inventory.

SAS administrators must be familiar with all the data SAS platform can interact with, especially now that there is Hadoop. Hadoop is not just a database--it's a different platform and a new world when compared with the SAS platform—so you can’t reuse your “old” skills when working with SAS and Hadoop (at least not all of them!).

Here are the questions we must ask:

  • Should SAS administrators enroll in a formal Hadoop certification training program?
  • Who is responsible for filling the skills gap (and pay for it)? You? Your boss? Your organization? Your human resources department?

If you really need some education – you can always step forward and do it on your own. Here’s how:

  • Start with YouTube tutorials on Hadoop foundations. Look into channels like Hortonworks or Edureka! I personally feel more comfortable getting introductory information from expert video because I prefer to see concepts presented visually.
  • Go to support.sas.com and type “Hadoop” in the search field (top right corner). SAS has provided a lot of papers, articles, and blogs on this topic.
  • Look for online or traditional Hadoop classes. The classroom gives you the chance to meet people studying Hadoop and the setting to exchange ideas with them.
  • Get a Hadoop certification from Cloudera or Hortonworks.
  • Go to one of the many Hadoop-related conferences such as Hadoop Summit or Strata.

Whether or not you’re going to be the next SAS/Hadoop guru, doesn’t mean you shouldn’t be familiar with terminology and SAS software that are part of the ecosystem.

SAS_admin_termsHere’s a list of terms you should know:

HDFS: Hadoop distributed file system

Map Reduce: key component for processing on distributed system

Hive: SQL interface for Hadoop

Impala: new generation of interactive SQL interface for Hadoop

Yarn: Hadoop workload manager

HBase: non-relational database

Pig: language for simplifying MapReduce’s usage

Spark: general purpose framework for distributed computing

Cloudera: a Hadoop distribution

Hortonworks: another Hadoop distribution

Here’s a list of SAS software you should be familiar with as a SAS administrator:

  • SAS\Access to Hadoop
  • SAS\Access to Impala
  • SAS Visual Analytics
  • SAS Embedded Process
  • SAS In-Database technology
  • SAS Data Loader for Hadoop

Last, but not the least, the aspect of learning Hadoop is The Big Data Architecture Convergence. All the big data players are moving to architectures miming the multi-machine, multi-tenant, distributed and shared-nothing Hadoop architecture. A full understanding of the Hadoop architecture will help you to face the future challenges for the SAS platform administrator.

Share

About Author

Andrea Negri

Senior Principal Technical Support Engineer

Andrea Negri is a Principal Technical Support Engineer helping SAS customers to address critical issues in the areas of architecture, performance and integration. On a personal level, he is a self-confessed technology addict, which helps him to identify new technology, ideas and solutions to help customers solve problems. His skills include SAS Platform installation, configuration, administration and optimization. He is currently working on SAS Platform Governance with a focus on knowledge sharing.

3 Comments

  1. Just adding to the list of SAS Software:
    - Base SAS (FILENAME Hadoop, PROC HADOOP, SPDE hdfs)
    - SAS SPD Server, hdfs support
    - SAS Grid Manager with YARN

    And perhaps not as an administrator, but anyway:
    - SAS Data Integration Studio, Hadoop and LASR transformations

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top