Are you a data migration sponsor? A reminder of your responsibilities.

Data migrations are never the most attractive of projects to sponsor. For those who have sponsored them previously, migrations can be seen as a poison chalice. As for the first-timers, data migration initiatives are often perceived as a fairly insignificant part in a far grander production.

The challenge with data migration projects, of course, is that few organisations do them regularly, so there is often a dearth of technical ability internally and even less within the sponsor community. As a result, project sponsors often have no idea what their role entails because there is no one to seek advice from.

This can be compounded by external suppliers who often claim they’re a "one-stop shop" for the migration. The reality, of course, is that hidden in the fine print of your contract are some hazy requirements around "data extraction," "data preparation," "file delivery," "data quality requirements," "extraction specification" or any number of get-out clauses for suppliers and third parties. Read More »

Post a Comment

The Big Lebowski, dashboards, and Twitter

Like most of the bloggers for this site, I am active on Twitter. Over the past six years, I have tweeted more than 20,000 times.

Sounds like I have no life, eh?

Well, maybe, but do the math. I average about ten tweets per day. If you're trying to connect with others and occasionally promote a book or six, then that number starts to seem a little less extreme.

Read More »

Post a Comment

Errors, lies, and big data

My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is.

Sometimes this fact appears to be acknowledged when numbers are presented along with a margin of error.

This, however, according to Seife, is “arguably the most misunderstood and abused mathematical concept. There are two important things to remember about the margin of error. First, the margin of error reflects the imprecision caused by statistical error—it is an unavoidable consequence of the randomness of nature. Second, the margin of error is a function of the size of the sample—the bigger the sample, the smaller the margin of error. In fact, the margin of error can be considered pretty much as nothing more than an expression of how big the sample is.” Read More »

Post a Comment

Challenges in harmonizing reference domains

In one of my prior posts, I briefly mentioned harmonization of reference data sets, which basically consisted of determining when two reference sets referred to the same conceptual domain and transforming the blending of the two data sets into a single conformed standard domain. In some cases this may be simple, especially if there is a single authoritative source for the data set. In that case, all you really need to do is align each copy of the reference data set with the authoritative source. Read More »

Post a Comment

How to extend the completeness dimension

If you’re involved in some way with data quality management then you will no doubt have had to deal with the completeness dimension.

This is often one of the starting points for organisations tackling data quality because it is easily understood and (fairly) easy to assess. Conventional wisdom has teams looking for missing values.

However, there is a problem with the way many practitioners calculate the completeness of their datasets and it relates to an over-dependence on the default metrics provided by software. By going a little further you can deliver far more value to the business and make it easier to prioritise any long-term prevention measures required. Read More »

Post a Comment

In defense of the indefensible

"Data doesn't matter. I know what I know."

It's a refrain that we've heard in some form for years now.

Some people want what they want when they want it, data be damned. It can be very tough to convince folks who already have their minds made up, a point that Jim Harris makes in "Can data change an already made up mind?"

Read More »

Post a Comment

Measurement and disestimation

In his book Proofiness: How You’re Being Fooled by the Numbers, Charles Seife coined the term disestimation, defining it as “the act of taking a number too literally, understating or ignoring the uncertainties that surround it. Disestimation imbues a number with more precision that it deserves, dressing a measurement up as absolute fact instead of presenting it as the error-prone estimate that it really is.”

Are you running a fever?

You know, for example, that normal body temperature is 98.6 degrees Fahrenheit (37 degrees Celsius). Using the apparent precision of that measurement standard, if you take your temperature and it exceeds 98.6 degrees, you assume you have a fever. But have you ever wondered where that measurement standard came from? Read More »

Post a Comment

Reference data lineage

There are really two questions about reference data lineage: what are the authoritative sources for reference data and what applications use enterprise reference data?

The criticality of the question of authority for reference data sets is driven by the need for consistency of the reference values. In the absence of agreed-to authoritative sources, there is little or no governance over the sets of values that are incorporated into different versions of the reference domains. The impact is downstream inconsistency, especially in derived information products such as reports and analyses.

For example, reports may aggregate records along reference dimensions (especially hierarchical ones like product categories or geographic locations). If there are different versions of the hierarchical dimension data (sourced from a reference domain), there will be differences in the derived reports, potentially leading to confusion in the boardroom when the results of those reports are shared. Read More »

Post a Comment

How to improve your data quality history taking

Whilst it’s nice to imagine a world of perfect data quality the reality is that most organisations will be dealing with data quality defects on a daily basis. I’ve noticed a wide variation in the way organisations manage the life cycle of defects and nowhere is that more apparent in the initial information gathering exercise that initiates the start of the data quality improvement cycle. Read More »

Post a Comment

Facebook and the myth of big data perfection

When it comes to using Big Data, Facebook occupies rarified air along with Amazon, Apple, Netflix, and Google. It's a point that I've made countless times before in my talks, books, and blog posts. But does that mean that the company has perfected its use of vast troves of mostly unstructured data?

Hardly. Read More »

Post a Comment