Want to improve data quality accuracy? Look for a techno-cultural shift

In a former life I would often visit sites with literally thousands of plant assets. Each asset was critical to the business not just in terms of capital costs but in the services it provided – and the risks that were caused due to poor maintenance or failure.

What was interesting (and troubling) in these visits was just how bad the data quality accuracy would often be.

For those new to data quality, accuracy is widely defined as a measure of how well a piece of information in a data store relates to its real life counterpart. Read More »

Post a Comment

Better customer service via data

Twenty years ago, I worked as a customer service rep for SONY Electronics. Every day, I would handle about 100 calls, usually from customers upset that their products had stopped working. Without question, the most irascible were the camcorder owners, especially those whose kids' birthdays were next weekend. (Yes, this was pre-smartphone.)

I was a decent rep. Some were better; others were worse. Sometimes my manager would pull me aside and make me listen to one of my less-than-stellar calls. She gave me specific feedback on how I could have handled the situation better, but this was only after the fact. That is, with the exception of her dialing in to the call, there was no real-time way of coaching me or anyone else for that matter.

Read More »

Post a Comment

Requirements, data quality and coffee

A panel discussion at the recent International Data Quality Summit opened with the seemingly straightforward request by the moderator for the panelists to begin by defining data quality. The resulting debate was blogged about by Ronald Damhof, who was one of the panelists.

On one side of the debate was the ISO 8000 definition of data quality that posits “quality data is data that meets stated requirements.” Damhof doesn’t agree and offered an alternative definition that I will slightly paraphrase as “quality data is data that has value to some person at some time.”

Damhof’s point was that not only is quality relative, but it varies over time. “Something that is perceived as high quality by someone now,” Damhof explained, “can be perceived as being of low quality a year later. So quality is never in a fixed state, it is always moving, fluently through time.” Furthermore, Damhof argued that it is possible to “meet stated requirements (voiced by person X at time Y) but still deliver a crappy quality.” On that point, I’ll use what I love as much as data—coffee (after all, data is the new coffee)—to explain why I agree with Damhof. Read More »

Post a Comment

Reference data set congruence

In my last post I discussed isomorphisms among reference data sets, where we looked at some ideas for determining that two reference data sets completely matched. In that situation, there was agreement about the meaning of every value in each of the data sets and that there was a one-to-one mapping of values from one data set to the other.

But what happens when you have two reference data sets that are almost isomorphic, but not exactly? In this case, you might have 100 values in data set A, 102 values in data set B, and 95 of those values map to identical value meanings in a common conceptual domain. These two reference data sets are close to being isomorphic except for the values that lie outside their intersection. If we propose a threshold percentage of the intersection size to the cardinality of both sets, then if that threshold is met, we can say that the two reference data sets are “congruent” or “almost equivalent.” Read More »

Post a Comment

How to re-balance a data migration project plan

How do you structure your data migration plan? How do you allocate resources and timescales so that you maximise your chance of success? In this article I want to explore some data migration project planning misconceptions and give you some pointers for an alternative approach.

One of the big improvements with data migration in recent years has been the splitting out of data migration as a distinct project. Historically, data migration was seen as a bit player in the target application implementation project, so it was just bundled into the planning for the major initiative. Read More »

Post a Comment

Does your company need extra chief officers?

"The chief analytics organization must rise. We think this is necessary to handle where analytics is heading."

So says Will Hakes, co-founder of Link Analytics, an Atlanta-based consulting firm.

Read More »

Post a Comment

As the butter churns in Bangladesh

“Correlation does not imply causation” is a saying commonly heard in science and statistics emphasizing that a correlation between two variables does not necessarily imply that one variable causes the other.

One example of this is the relationship between rain and umbrellas. People buy more umbrellas when it rains. This establishes a strong correlation between rainy days and umbrella sales. This does not imply, however, that buying an umbrella causes it to rain—obviously it does not. It also does not necessarily imply that umbrella sales are caused by rain. Yes, being caught unprepared by a rainstorm can cause you to buy an umbrella. But not only do preparedness-minded people buy umbrellas so that they are ready for a rainy day, people also buy umbrellas on, or in preparation for, sunny days to protect themselves from blinding and skin-burning sunlight.

The point is that correlations are easy to find and causes are difficult to prove. It is, therefore, a correlation more often than a cause that triggers us to take what we consider to be a data-driven action. Read More »

Post a Comment

Determining reference data set isomorphisms

In my last post we started talking about the tasks associated with data harmonization; the topic of this week’s post is determining that two reference data sets refer to the same conceptual domain.

First, let’s review some definitions:

  • A value item is a representation of a specific value meaning in a value domain.
  • A value domain is a collection of value items.
  • A conceptual domain represents the meanings of the permissible values in a value domain.
  • A value meaning is a relation between a concept in a conceptual domain and a value item. Read More »
Post a Comment

3 (low cost) tactics for data quality improvement

When I speak to our members on Data Quality Pro, a lot of their fears revolve around budgetary issues:

  • “Will I be able to create a compelling business case for the finance steering committee?”
  • “Will our funding run out before we complete phase 1?”
  • “How can we hire new staff before we’ve demonstrated value to the business?”

Data quality management is often seen as a cost-base for an organisation but it doesn’t need to be that way. There are many tactics that can be deployed to help improve the quality of your data without calling for a cast of thousands and an executive begging bowl. Read More »

Post a Comment

DaaS Is BaaS

The explosion in enterprise technology over the past decade is perhaps only rivaled by the commensurate explosion in terms. There's no shortage of "as a service" terms today. They include:

  • Software as a service
  • Infrastructure as a service
  • Platform as a service
  • Next generation Big Data Platform as a Service (You know, because this generation's Big Data Platform as a service is so dated.)
  • Database as a service
  • Service as a service

Read More »

Post a Comment