SAS ODPi Interoperability helps reduce risk, simplify testing, accelerate development on Hadoop

Just in time for the Strata + Hadoop World Conference, SAS became the first software vendor to achieve ODPi Interoperability with our Base SAS® and SAS/ACCESS® Interface to Hadoop products. Now, that's a lot to digest – so let me back up a second and give some background as to what this means and why it's important.

Read More »

Post a Comment

Automating operational data validation, Part 2

In my last post, we explored the operational facet of data governance and data stewardship. We focused on the challenges of providing a scalable way to assess incoming data sources, identify data quality rules and define enforceable data quality scientist considers automating operational data validation

As the number of acquired data sources increases, it becomes difficult to specify data quality rules in a timely manner, let alone manage their implementation and enforcement – that is, unless you have a means of automating various aspects of the data stewardship procedures. Fortunately, you can use existing data tools to automate five critical duties of the data steward. Read More »

Post a Comment

Why big data and streaming go hand in hand

sun over planet represents big data and streamingAs I've previously written, data analytics historically analyzed data after it stopped moving and was stored, often in a data warehouse. But in the era of big data, data needs to be continuously analyzed while it’s still in motion – that is, while it’s streaming. This allows for capturing the real-time value of data before it’s lost in the time lag between creation and storage – and before it’s lost in the time lag between analysis and action.

“It’s a streaming world,” David Loshin recently wrote. “In the past, much of the data that was collected and consumed for analytical purposes originated within the organization and was stored in static data repositories. Today, there is an explosion of streaming data. We have human-generated content such as data streamed from social media channels, blogs, emails, etc. We have machine-generated data from myriad sensors, devices, meters and other internet-connected machines. We have automatically generated streaming content such as web event logs. All of these sources stream massive amounts of data and are prime fodder for analysis.” Read More »

Post a Comment

Why an agile mind-set will lead to better data prep and analytics

young woman with laptop studying agile developmentWhile the terms and technologies have changed, analytics has been with us for a long time now.

Let's go back in time for a moment to the early 2000s. Equipped with new databases, data warehouses and even BI tools, mature organizations were eager to turn data into knowledge, insights and even action. When embarking on analytics "projects," many would contemplate the following data preparation questions:

  • Do you know what story you want to tell before you prepare the data?
  • What's the end goal?
  • What are you trying to see with your data? (Implicit in this question is the importance of working closely with people who understand the data.)

Note how simple these questions seem.
Read More »

Post a Comment

Data preparation and data wrangling, Part 2 (yippee, bring your lasso)

129617657In Part 1 of this two-part series, I defined data preparation and data wrangling, then raised some questions about requirements gathering in a governed environment (i.e., ODS and/or data warehouse). Now – all of us very-managed people are looking at the horizon, and we see the data lake. How do we manage THAT?

As data lake usage continues to grow, we may need to modernize our thoughts around different types of data and how it should be managed. The definition of data management, consequently, will take on some new aspects. For example, data in the data lake may not be checked for quality or integrated with other data, but may not be used for governmental and/or external reporting. That’s a rule that can be put in place… So, yep data lake: here we come! Read More »

Post a Comment

Data validation as an operational data governance best practice, Part 1

man with tablet represents data validation and governanceData governance can encompass a wide spectrum of practices, many of which are focused on the development, documentation, approval and deployment of policies associated with data management and utilization. I distinguish the facet of “operational” data governance from the fully encompassed practice to specifically focus on the operational tasks for data stewards and data quality practitioners in ensuring compliance with defined data policies.

The life cycle for data quality policies includes the determination of data validity rules, their scope of implementation, the method of measurement, institution of monitoring and the affiliated stewardship procedures. Those stewardship duties include:

  • Evaluating the source data to identify any potential data quality rules.
  • Deploying the means for validating the data against the defined rules.
  • Investigating the root causes of any identified data flaw.
  • Alerting any accountable individuals that an issue has appeared within the data flow.
  • Managing the workflow to ensure that the root cause is addressed. Read More »
Post a Comment

Better tech means better analytics and less data prep

On June 19, the NBA finals concluded. The Cleveland Cavaliers and Golden State Warriors played a historically great series, culminating in a must-see seventh game that more than 30 million Americans watched. And what a game it was. By some advanced measures, we've never seen two better teams meet on the world's largest basketball stage. Read More »

Post a Comment

Data preparation and data wrangling, Part 1 (yippee, bring your lasso)

cowboy represents data wranglingI'm a very fortunate woman. I have the privilege of working with some of the brightest people in the industry. But when it comes to data, everyone takes sides.

Do you “govern” the use of all data, or do you let the analysts do what they want with the data to arrive at conclusions that could change business? These are hard decisions that require many conversations.

Let’s examine this dilemma by starting with some definitions. Read More »

Post a Comment

Data prep should not be a one-time exercise

Intelligent organizations realize that data preparation should not be a one-time exercise. Here's the story of one organization that didn't get it.

Read More »

Post a Comment