Data quality mastery depends on change management essentials

Many managers still perceive data quality projects to be a technical endeavour. Data being the domain of IT and therefore an initiative that can be mapped out on a traditional project plan with well-defined exit criteria and a clear statement of requirements.

I used to believe this myth too.

Coming from a technical background as a software engineer I set out to create an increasingly elaborate technology toolkit to combat data quality issues.

The results were fleeting. Quick gains were achieved, but the organisation soon started to slide back into their old, comfortable, ways of working.

When I finally set out a vision for where I wanted to go with data quality, and why we were heading there, things started to change. People became far more supportive when they understood their role in the bigger system. They became advocates and evangelists. Passionate for the cause. Read More »

Post a Comment

Non-geeks want to know: will Hadoop mess up my data warehouse ecosystem?

Hadoop recently turned eight years old, but it was only 3-4 years ago that Hadoop really started gaining traction. It had many of us “older” BI/DW folks scratching our heads wondering what Hadoop was up to and if our tried-and-true enterprise data warehouse (EDW) ecosystems were in jeopardy. You didn't have to look hard to hear or read exclamations such as “The EDW is dead!” or questions like “Is my data warehouse a dinosaur?

That was just three years ago.

Today, it’s not hard to find discussions—from formal research to industry events and online articles—that squelch the Hadoop dominance fears of yesteryear. The distinguished data warehousing expert Ralph Kimball joined Cloudera in a webcast earlier this year to talk about Hadoop and the EDW. In a lengthy Q&A at the end of the webcast, Dr. Kimball talked about the coexistence of these two technologies: “Conventional RDBMSs will be with us forever as they are superbly good at being OLTP engines and query targets for text and number data…Although the bigness of Big Data is impressive, it is less interesting than the variety. That is where Hadoop really makes a sustainable difference.” Read More »

Post a Comment

Finding the signal in the Twitter noise

There's little doubt that Twitter has become a very noisy place since its inception. The company generates billions of tweets per year, a number that has grown exponentially since co-founder Jack Dorsey tweeted "just setting up my twttr" in 2006.

Read More »

Post a Comment

For those about to steward data, we salute you

Data stewardship is one of the most important, and commonly misunderstood, aspects of data quality and data governance. Not only are the characteristics of a good data steward a rare combination of skills to find in an individual employee, the culture of most organizations does not nurture the development of data stewardship.

It would not be at all unusual, therefore, if your organization does not have anyone with data steward as their official job title. Nonetheless, data stewards do exist in your organization. And you should be glad that they do. Read More »

Post a Comment

The master entity model’s need for synchrony

Transaction systems that feed master data repositories ensure a degree of synchronization that ensures proper transaction execution. The transaction processing system generally only looks at a few records at one time, and updates are committed so that they do not interfere with other transactions. Transaction within each subsystem are isolated from other transaction processing subsystems, and that guarantees freedom from interference.

Master data management entity models are designed with the presumption that updates or modifications are also protected from interference. However, if the inputs are not configured or presented in the same sequence as the original transaction systems, there is a risk of inaccurate, inconsistent, or incomplete data being committed to the master index registry. And when multiple transaction system extracts are processed by the MDM system in a single batch, you pretty much ensure that the original synchronization orders have been violated, resulting in a greater chance of violating data fidelity expectations. Read More »

Post a Comment

Cracking the code for successful conversions: Facilitated sessions – what do we do now?

So, the facilitated session is over. You have gathered a tremendous amount of information, and you may be wondering what to do now.  I like to categorize all the information, and hopefully during the session you did some prioritization of the requirements based on company goals. 

If you have never looked at the company goals, this is a good time to dust them off. You can use them to make sure that this conversion and the requirements that surround this conversion are aligned with company goals. Read More »

Post a Comment

The two worlds of data quality motivation

How do you convince executives, frontline workers, middle managers and everyone else in your organization that data quality is a worthwhile activity for them to support?

The answer is to tap into their personal motivation. It is one of the most important data quality lessons to master because everything else is irrelevant if you cannot motivate people to act. Read More »

Post a Comment

Drowning in data

"Technology is wonderful when it isn't in the way."
–Steve Hogarth

Like many readers of this blog, I think about data just about every day. I don't ponder it abstractly, though. I contemplate frequently about the intersection of business, data, technology, and people. I am amazed when I see people doing things manually that could be done in a few seconds–if they only knew a better way. Lest I be hypocritical, I love it when people show me time-saving tricks.

Read More »

Post a Comment

The truth about truth

Truth is a funny thing. And I don’t just mean how some true things are funny. A few examples include that strawberries are not berries, peanuts are not nuts, Chock Full o’Nuts coffee does not contain nuts, and the singer-songwriter Barry Manilow did not write his hit song “I Write the Songs.”

Truth is also a funny thing in the sense that it can be difficult to get people to agree on what is true. Which is weird considering truth is commonly defined as that which is in accordance with reality, based on facts, accurate, and verifiable. So why is it so hard to achieve consensus about truth? For one thing, calling something true sounds absolute, not relative. If something is true, everyone, regardless of their perspective, should agree. This concept comes up frequently in discussions about data quality and master data management, often within the context of creating a single version of the truth. Read More »

Post a Comment

The problem with blurring transaction sequences during master data ingestion

Transaction systems maintain synchronization of the stages of their execution sequences. That is a fact; otherwise users would never trust that the results of the process (think about it – you would not want to incur a service fee at your bank because your withdrawal was inadvertently processed before the deposit was processed). And business processes are ordered as well. The company won’t ship the ordered product until the payment is processed.

But at the end of the day we collect all the transactions from all the systems and blend them together by dumping them into the MDM system. The data extracted from each transaction system reflects the result of the day’s transactions but retains little if any history of the order in which transactions took place. And when data extracts are pooled in preparation for identity resolution and record linkage, in essence, without ensuring that the extracted data is processed in the same order as the original transactions were executed, we may be resolving entity information in the wrong order. Read More »

Post a Comment