3 Thanksgiving lessons about data warehouses, Hadoop and self-service data prep


It's that time of year again where almost 50 million Americans travel home for Thanksgiving. We'll share a smorgasbord of turkey, stuffing and vegetables and discuss fun political topics, all to celebrate the ironic friendship between colonists and Native Americans. Being part Italian, my family augments the 20-pound turkey with pasta – more specifically, cavatelli (pronounced "cav-a-deal"). And, being part Swedish, we add Swedish meatballs to the fray.

Before the feast, these dishes have to be prepared, often using a time-tested recipe. The perfect gravy or Swedish meatball – which might have taken my nana or grandma several years of experimentation to master – is deployed and consumed in mere minutes. Later, the meal is stored away in Tupperware bins to be deployed and consumed another day.

Meals and data need to be prepared and blended while still hot.

Think of these time-tested entrees as data dishes that people across your organization are anxious to consume. Imagine business analysts, data scientists and chief data officers all sitting at the table together consuming different combinations and slices of your data. How can you serve precisely the right combinations of data to the right people while it's still hot?

One way is to use SAS Data Management, which has lots of recent updates that can help you cleanse, prepare and deploy your data faster and better than ever. 

Getting back to Thanksgiving... Let's look at three lessons Thanksgiving can teach us about data warehouses, Hadoop and self-service data prep.

1) Complement your data warehouse environment with Hadoop

The main entree at Thanksgiving meals is usually turkey. I think of this main dish as analogous to the data warehouse, which is Oracle or Teradata at many organizations. This is where you pull the bulk of your data to feed numerous data marts, analytic base tables, and other reports or analyses. And if the data warehouse is your turkey, you can think of the pasta as Apache Hadoop – the disruptive, open source, lower-cost big data technology that now often complements, and occasionally replaces, the data warehouse. TDWI discovered that about 17 percent of organizations are using Hadoop to complement their existing data warehouse, and it's expected to grow to 36 percent within three years.

While you may see a plate filled with turkey, pasta, meat stuffing and artichokes when you think of Thanksgiving dishes, I see something different. I see a unique, blended combination of data from your data warehouse, complemented by data from a Hadoop data lake, and sprinkled with other customer and product sales data.

2) Speed access to data with self-service data prep

Despite how hungry some of my relatives might be to get my grandma's hard-won Swedish meatball recipe, I can get away with not sharing it with everyone. But the same concept doesn't work for business. When it comes to getting the data they need, users have certain requirements that need to be met in keeping with the goals of the business. But as they attempt to do their work, many business users encounter hard-to-access data silos, poor data quality, and a disjointed tool set for data preparation and business intelligence functions.

TDWI recently found that users only do their own data preparation (self-service data prep) 13 percent of the time. That leaves lots of opportunity to offload some of the work from IT and push it to the business – which can speed access to data and increase agility at the same time.

3) Quickly capture and deploy self-service data prep tasks into your business

An expert cook like my grandmother might be able to put together an entire Thanksgiving feast in a matter of hours. But imagine how long it would take someone like me – I rarely cook, and I lack the skills my grandma has refined over the course of her lifetime.

As a business user, you don't have to rely on a small group of experts (IT) to prepare and serve all of your data. There are ways to automate the process so you can get the data you need, faster. Click To Tweet

The latest version of SAS Data Management provides tools to help you create the exact combination of data you need from trusted sources – blended in just the right way, using just the right amount of preparation. I think of it as empowering business users to fill their own plates with the types of data they need to satisfy their hunger. And for the business – it all comes down to moving from data to decisions faster, while sharing insights all across the business.

Fill your plate with SAS Data Management 

  • Link data discovery and deployment with SAS Data Loader for Hadoopour self-service data prep tool that helps business users prepare, cleanse and blend data without knowing how to write code. The latest version, 3.1, includes expanded data connectivity using SAS/ACCESS® libraries and improved collaboration – with multiuser logins and sharing of secure directives, improved governance and streamlined deployment. What's more, you can pull  directives into SAS Data Integration Studio as a way to operationalize your data preparation tasks.
  • Expand cloud and in-memory data access, across more sources. While SAS Data Loader for Hadoop has access to more sources via SAS/ACCESS software, SAS Data Integration Studio can now view, read from and write to SAS Viya, our newest in-memory analytical platform. At the same time, SAS Data Integration Studio can more readily read and work with data from cloud sources including Amazon S3, sFTP, and Amazon Redshift. Plus, in-database processing is now supported on SAS/ACCESS to Amazon Redshift to minimize data movement and improve performance. And SAS/ACCESS to SQL Server is supported on Microsoft Azure.
  • Improve security and governance for Hadoop. Changes in the base platform allow SAS to ingest more metadata from more sources, including Hadoop, Google BigQuery and Tableau. So you'll get an accurate depiction of how your data assets are related, while enjoying support for the latest Hadoop distributions and related security offerings. SAS Data Loader for Hadoop can now secure, share and manage directives using SAS folders, and it uses HDFS encryption while the data is there. 

When you sit down to enjoy your favorite family meal this Thanksgiving, you might only be thinking about the turkey, meat stuffing and that pumpkin pie calling your name from across the room. But as I feast on my cavatelli and cannolis, I'll be thinking about how organizations are changing the way they manage, cleanse, prepare and deploy data. And how SAS Data Management can help them get exactly the right combination to the right people, before the data gets cold.

Happy Thanksgiving from my family to yours!

Find out what TDWI has to say about data prep for analytics. And if you're still hungry, dig in a little deeper at the SAS Data Management Community.


About Author

Matthew Magne

Principal Product Marketing Manager

@bigdatamagnet - Matthew is a TEDx speaker, musician, and Catan player. He is currently the Global Product Marketing Manager for SAS Data Management focusing on Big Data, Master Data Management, Data Quality, Data Integration and Data Governance. Previously, Matthew was an Information Management Solutions Architect at SAS, worked as a Certified Data Management Consulting IT Professional at IBM, and is a recovering software engineer and entrepreneur. Mr. Magne received his BS, cum laude, in Computer Engineering at Boston University, has done graduate work in Object Oriented Development and completed his MBA at UNCW.

Related Posts


Leave A Reply

Back to Top