Conjunction Junction, What’s Your Function? Or, 3 Ways to interact with Hadoop

0

I loved Schoolhouse Rock on Saturday mornings.  You may remember “I’m Just a Bill”, or “Interplanet Janet” (she’s a galaxy girl, and pre-Pluto declassification - that's messed up) .  Some of you may have no idea what I’m talking about … so check the links or the video below.

Conjunction Junction provided by Disney Productions.

Conjunction Junction” came to mind recently when we were discussing how systems interact with Hadoop.  The Schoolhouse Rock phrase was “And, But, and Or can get you very far”.

With regard to Hadoop, “From, With, and In” is the operative phrase.

So, the question on the table is, how do your systems interact with Hadoop?  Let me define “From, With, and In”.

“From” is accessing and extracting data from Hadoop for processing, and writing any results back to Hadoop.  I classify this as “business as usual”.  In other words, use Hadoop as a data repository and computer systems perform their operational activity just like they would in a traditional sense.

“With” is accessing and processing Hadoop data while keeping the data and computations massively parallel.  I classify this as moving data to compute, not through a “straw”, but from each Hadoop node simultaneously.

“In” is processing data directly in the Hadoop cluster.  In other words, leveraging the cycles on the Hadoop cluster to perform work.

SAS interacts with Hadoop in each of these ways.

“From” – By utilizing the SAS/ACCESS to Hadoop technology or storing your SAS datasets  in SPDE format on the Hadoop cluster (new feature in SAS’ latest release 9.4M2), SAS can operate “business as usual”.  Data moves from storage to compute for processing.

“With” – Some of the SAS In-Memory solutions can “lift” data in a massively parallel way into SAS-managed memory for computation.  Visual Analytics, Visual Statistics, In-Memory Statistics for Hadoop, and the High Performance Analytic procedures are examples of SAS working “With” Hadoop.

“In” – SAS Code Accelerator for Hadoop, SAS Data Quality Accelerator for Hadoop are examples of SAS processing data directly on each node in the Hadoop cluster.  By submitting “work” to a light-weight SAS engine on each node, SAS is able to process, manipulate, transpose, impute, classify, (I could go on) data and then write the results directly back to the node.  Another example of massively parallel processing.

By having a FROM, WITH, and IN strategy, SAS is able to help our customers leverage this new data environment without significant disruption to current business processes, AND also help our customers position themselves to take advantage of innovated, processing techniques, BUT migrating should not be done without thought, OR you may find yourself with just another environment to maintain.

Did you see what I did there?  Conjunction Junction!

Learn more about Hadoop in this checklist report from TDWI.

Tags
Share

About Author

Gary Spakes

Enterprise Architecture

As the lead for SAS' America's Enterprise Architecture Practice, Gary focuses on both the technical aspects and business implications involving the SAS High Performance Analytics platform. An IT veteran, Gary leverages his technology background to help organizations alleviate immediate pains, but not compromise long-term initiatives.

Comments are closed.

Back to Top