Data quality - one dimension at a time

I was recently asked what I would focus on given limited funds and resources to kickstart a data quality initiative.

This is a great question, because I’m sure many readers will find themselves in this position at some point in their career.

My answer is to become ruthlessly focused on managing one data quality dimension - completeness. Read More »

Post a Comment

On data, breaches, names and faucets

A few months ago, I wrote a piece on this site about generic error messages and how they reflect an organization's data-management practices. I believe that they say quite a bit about how an organization values data management and, more generally, data.

In the post, I skewered Enterprise Rent-A-Car. Make no mistake though: plenty of companies communicate in a way that makes customers question the extent to which they value data.

Read More »

Post a Comment

Big data and the treadmill of overconfidence

In her book Mastermind: How to Think Like Sherlock Holmes, Maria Konnikova discussed four sets of circumstances that tend to make us overconfident:

  1. Familiarity — When we are dealing with familiar tasks, we feel somehow safer, thinking that we don't have the same need for caution as we would when trying something new. Each time we repeat something, we become better acquainted with it and our actions become more and more automatic, so we are less likely to put adequate thought or consideration into what we're doing.
  2. Action — As we actively engage, we become more confident in what we are doing. In one study, individuals who flipped a coin themselves, in contrast to watching someone else flip it, were more confident in being able to predict heads or tails accurately, even though, objectively, the probabilities remained unchanged.
  3. Difficulty — We tend to be under-confident on easy problems and overconfident on difficult ones. This is called the hard-easy effect. We underestimate our ability to do well when all sign all signs point to success, and we overestimate it when the signs become less favorable.
  4. Information — When we have more information about something, we are more likely to think we can handle it, even if the additional information doesn't actually add to our knowledge in a significant way. Read More »
Post a Comment

Reference data harmonization

We have looked at two reference data sets whose code values are distinct yet equivalently map to the same conceptual domain. We have also looked at two reference data sets whose values sets largely overlap, though not equivalently. Lastly, we began the discussion about the guidelines for determining when reference data sets can be harmonized. In this last post of this month’s series, let’s look at some practical steps for harmonization. Read More »

Post a Comment

Want to improve data quality accuracy? Look for a techno-cultural shift

In a former life I would often visit sites with literally thousands of plant assets. Each asset was critical to the business not just in terms of capital costs but in the services it provided – and the risks that were caused due to poor maintenance or failure.

What was interesting (and troubling) in these visits was just how bad the data quality accuracy would often be.

For those new to data quality, accuracy is widely defined as a measure of how well a piece of information in a data store relates to its real life counterpart. Read More »

Post a Comment

Better customer service via data

Twenty years ago, I worked as a customer service rep for SONY Electronics. Every day, I would handle about 100 calls, usually from customers upset that their products had stopped working. Without question, the most irascible were the camcorder owners, especially those whose kids' birthdays were next weekend. (Yes, this was pre-smartphone.)

I was a decent rep. Some were better; others were worse. Sometimes my manager would pull me aside and make me listen to one of my less-than-stellar calls. She gave me specific feedback on how I could have handled the situation better, but this was only after the fact. That is, with the exception of her dialing in to the call, there was no real-time way of coaching me or anyone else for that matter.

Read More »

Post a Comment

Requirements, data quality and coffee

A panel discussion at the recent International Data Quality Summit opened with the seemingly straightforward request by the moderator for the panelists to begin by defining data quality. The resulting debate was blogged about by Ronald Damhof, who was one of the panelists.

On one side of the debate was the ISO 8000 definition of data quality that posits “quality data is data that meets stated requirements.” Damhof doesn’t agree and offered an alternative definition that I will slightly paraphrase as “quality data is data that has value to some person at some time.”

Damhof’s point was that not only is quality relative, but it varies over time. “Something that is perceived as high quality by someone now,” Damhof explained, “can be perceived as being of low quality a year later. So quality is never in a fixed state, it is always moving, fluently through time.” Furthermore, Damhof argued that it is possible to “meet stated requirements (voiced by person X at time Y) but still deliver a crappy quality.” On that point, I’ll use what I love as much as data—coffee (after all, data is the new coffee)—to explain why I agree with Damhof. Read More »

Post a Comment

Reference data set congruence

In my last post I discussed isomorphisms among reference data sets, where we looked at some ideas for determining that two reference data sets completely matched. In that situation, there was agreement about the meaning of every value in each of the data sets and that there was a one-to-one mapping of values from one data set to the other.

But what happens when you have two reference data sets that are almost isomorphic, but not exactly? In this case, you might have 100 values in data set A, 102 values in data set B, and 95 of those values map to identical value meanings in a common conceptual domain. These two reference data sets are close to being isomorphic except for the values that lie outside their intersection. If we propose a threshold percentage of the intersection size to the cardinality of both sets, then if that threshold is met, we can say that the two reference data sets are “congruent” or “almost equivalent.” Read More »

Post a Comment

How to re-balance a data migration project plan

How do you structure your data migration plan? How do you allocate resources and timescales so that you maximise your chance of success? In this article I want to explore some data migration project planning misconceptions and give you some pointers for an alternative approach.

One of the big improvements with data migration in recent years has been the splitting out of data migration as a distinct project. Historically, data migration was seen as a bit player in the target application implementation project, so it was just bundled into the planning for the major initiative. Read More »

Post a Comment

Does your company need extra chief officers?

"The chief analytics organization must rise. We think this is necessary to handle where analytics is heading."

So says Will Hakes, co-founder of Link Analytics, an Atlanta-based consulting firm.

Read More »

Post a Comment