Innovations in Big Data Management leveraging Hadoop

The "Internet of Things" is the latest buzzword characterizing the machine-generated big data that has outstripped our ability to derive value from it. Think of UPS delivering 16 million packages every day through various hubs and all the logistics and decisioning that goes into that.

But how does an organization glean insight and value from this heretofore "dark data?" You need data that you can trust to drive more effective decisions – and data quality becomes even more important as volumes increase.

The SAS Data Management offering already allows for access to big data stored in Hadoop clusters via built-in transforms, the SAS Access Interface to Hadoop, and various SAS language procs (like proc Hadoop). Cloudera does a great job of explaining this in more detail here, and there is a great SAS Big Data Management whitepaper here.

Beyond that, here are some innovative products that give your organization the ability to better manage the three P's of big data. (Yes, the three or four V's of big data, including volume, variety, etc., can get a little cumbersome.)

1) Precision: Take processing to the data (not vice versa)

SAS, a leader in data quality, has developed the SAS Data Quality Accelerator for Teradata (with other engines in the pipeline including Hadoop). Traditionally, it wasn't prudent or even possible to clean up data in the data warehouse (DW). The volumes were too big; the batch windows were too small.

Now, we can leverage our in-database technologies to cleanse the data without moving it out of the DW. This compresses the time it takes to deduplicate customers or align product masters, resulting in higher levels of data quality. In-database processing blends with the SAS Data Management theme of moving the processing to the data, instead of the data to the processing.

2) Pace: Find meaning now, not later.

SAS Event Stream Processing Engine gives you the ability to extract meaning and make decisions from extremely low-latency streams (think hundreds of thousands per second) of data from the "Internet of Things." Instead of running queries on the data, SAS Event Stream Processing runs data through the queries to identify patterns. Customers are using this technology to conduct real-time risk assessments of financial information (like stock quotes) or to conduct early warning maintenance on high dollar assets like airplanes or pipelines.

If you are faced with having to score more and more analytical models on larger volumes of data, try SAS Scoring Accelerator for Hadoop. Continuing with the theme of pushing the processing or math down to the database, the SAS Scoring Accelerator for Hadoop takes advantage of the SAS-embedded processing capabilities to push the scoring process down to each node of the Hadoop cluster. This helps to reduce batch windows and meet the increasing demand for operational analytics.

3) Partnerships: Work with the experts

Complementing the SAS Access Interface to Hadoop is the SAS Access interface to Cloudera Impala due in the coming months. Cloudera Impala is an optimized SQL query engine that sits on top of the Cloudera distribution of Hadoop. It was created to overcome the latency issues associated with extracting data from Hadoop clusters. This new access to Impala augments all the SAS Data Management offerings and allows more immediate access to big data. A press release of the partnership is here.

As you can see, there is lots of activity in the big data space, and SAS is at the forefront, blending traditional data management skills with cutting-edge capabilities to glean value from your big data deployment.

Blogs

Blogs

Big data management precision, pace and partnerships: Cloudera Impala, Hadoop and Teradata

About Author

Leave A Reply Cancel Reply