Why analytical models are better with better data

man focusing on why analytical models are better with better dataMost enterprises employ multiple analytical models in their business intelligence applications and decision-making processes. These analytical models include descriptive analytics that help the organization understand what has happened and what is happening now, predictive analytics that determine the probability of what will happen next, and prescriptive analytics that focus on finding the best course of action for predicted future business scenarios.

The common denominator of all analytical models is data. And, as the TDWI Best Practices Report Improving Data Preparation for Business Analytics explained, regardless of the model used, analytics can only be as good as the underlying data. Analytics based on poor-quality data can lead to bad business decisions. For example, geographical profiling of customers based on inaccurate postal address data provides a false impression of where the most valuable customers live and could drive bad business decisions about where to focus marketing efforts. Read More »

Post a Comment

Data governance in action

Many people have the perception that data governance is all about policies and mandates, committees and paperwork, without any real "rubber on the road" impact.

I want to dispel this viewpoint by sharing a simple example of how one company implemented data governance to enforce something practical that delivered long-term benefits for customers, stakeholders and users. The form of governance in question relates to implementing better standards and approaches toward data migration projects.

Read More »

Post a Comment

Operational data governance: Policy vs. procedure for data validation

planning operational data governanceIn my prior posts about operational data governance, I've suggested the need to embed data validation as an integral component of any data integration application. In my last post, we looked at an example of using a data quality audit report to ensure fidelity of the data integration processes for loading data from a variety of sources into a data warehouse.

As it becomes apparent that the definition of data quality is in the eyes of the downstream data consumers, it's important to note the implications for data validation:

  • There may be multiple versions of “validity.” When data use is context-driven, what is valid for one data user may be irrelevant for others. Yet each data consumer is entitled to a view that conforms to the validity rules associated with the business process.
  • There may be different levels of severity for invalid data. In some cases invalid data is logged but still loaded into the target data warehouse. But in more severe situations, the data integration process fails and the invalidity has to be remediated before attempting to restart. There's a wide spectrum between those two extremes. But one common theme is the need to specify the types of invalidities, their severity, who should be notified, and what needs to happen when the issue is identified.
  • All of these specifications need to be operational simultaneously. This third implication suggests a greater level of complexity than the first two. Although different data consumer communities may have different expectations, you cannot focus on one community while ignoring the others. There must be a business directive to guarantee that all data validation rules will at least be monitored, if not assured, at the same time.
  • There must be ways to remediate invalid data without creating inconsistency. This last implication may be counterintuitive – but if we are willing to allow different constituencies to define their own expectations, it's not unreasonable to have different sets of expectations that clash with one another. Yet observing our prior implication (above) suggests that managing against the introduction of inconsistency must become a priority for ensuring enterprise data usability.

Read More »

Post a Comment

The next data governance challenge: Agility

the next data governance challenge is agilityBy now, odds are that you've heard the story of how Target used data to predict customer pregnancy with astonishing accuracy. Writing for The New York Times, Charles Duhigg caused quite the stir by reporting that a Target statistician:

was able to identify about 25 products that, when analyzed together, allowed him to assign each shopper a “pregnancy prediction” score. More important, he could also estimate her due date to within a small window, so Target could send coupons timed to very specific stages of her pregnancy.

And the company did just that. As the story goes, Target knew that a 16-year-old woman was pregnant before her own father did. Read More »

Post a Comment

How do you measure the value of data governance?

Data governance plays an integral role in many enterprise information initiatives, such as data qualitymaster data management and analytics. It requires coordinating a complex combination of factors, including executive sponsorship, funding, decision rights, arbitration of conflicting priorities, policy definition, policy implementation, data stewardship and change management. With so much overhead involved in running a data governance program, it’s essential to be able to measure the value of data governance.

Read More »

Post a Comment

To stream or not to stream?

man contemplating: to stream or not to streamHadoop may have been the buzzword for the last few years, but streaming seems to be what everyone is talking about these days. Hadoop deals primarily with big data in stationary and batch-based analytics. But modern streaming technologies are aimed at the opposite spectrum, dealing with data in motion and providing analytical insights in flight.

Streaming technologies have been around for a number of years. But recently, the numbers and types of use cases that could take advantage of these technologies has exploded. Today, the question is not really about whether or not to stream. It’s about how to marry new streaming capabilities and approaches with emerging use cases. Read More »

Post a Comment

What's the difference between data governance and data management? (Part 2)

woman looking for a way to define data governanceIn Part 1 of this series, we defined data governance as a framework – something an organization can implement in small pieces. Data management encompasses the disciplines included in the data governance framework. They include the following:

  • Data quality and data profiling.
  • Metadata (business, technical and operational).
  • Data security.
  • Data movement within the enterprise.
  • Data movement/usage outside of the enterprise.
  • Data stewardship or data ownership.
  • Execution of architectures (including data warehousing and big data).
  • Execution of policies and practices set forth in the data governance framework.

I'm sure there are a few more you could add, but this has become quite a large list. Read More »

Post a Comment

What AirBNB teaches us about traditional data governance

With a valuation nearing $30B, AirBNB is a really big deal. The home-sharing service aims to disrupt the traditional hotel industry by letting every Joe and Jane Sixpack turn their homes into de facto temporary lodging.

AirBNB's business model is nothing if not innovative – perhaps too innovative for state and local legislatures. You see, many lodging statutes were conceived decades ago, long before apps and smartphones made anything remotely resembling AirBNB possible. Plenty of established industry types believe that the home-sharing service violates many of these laws. (Faced with significant opposition, AirBNB is upping its lobbying efforts.) Beyond the question of its very legality, the company faces increasing claims that homeowners use the site to discriminate against minority renters. Read More »

Post a Comment

Operational data governance: Who owns data quality problems?

Data integration teams often find themselves in the middle of discussions where the quality of their data outputs are called into question. Without proper governance procedures in place, though, it's hard to address these accusations in a reasonable way. Here's why.

Read More »

Post a Comment