Errors, lies, and big data

My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is. Sometimes this fact appears to […]

Post a Comment

Challenges in harmonizing reference domains

In one of my prior posts, I briefly mentioned harmonization of reference data sets, which basically consisted of determining when two reference sets referred to the same conceptual domain and transforming the blending of the two data sets into a single conformed standard domain. In some cases this may be […]

Post a Comment

How to extend the completeness dimension

If you’re involved in some way with data quality management then you will no doubt have had to deal with the completeness dimension. This is often one of the starting points for organisations tackling data quality because it is easily understood and (fairly) easy to assess. Conventional wisdom has teams […]

Post a Comment

Measurement and disestimation

In his book Proofiness: How You’re Being Fooled by the Numbers, Charles Seife coined the term disestimation, defining it as “the act of taking a number too literally, understating or ignoring the uncertainties that surround it. Disestimation imbues a number with more precision that it deserves, dressing a measurement up as absolute […]

Post a Comment

Reference data lineage

There are really two questions about reference data lineage: what are the authoritative sources for reference data and what applications use enterprise reference data? The criticality of the question of authority for reference data sets is driven by the need for consistency of the reference values. In the absence of […]

Post a Comment

How to improve your data quality history taking

Whilst it’s nice to imagine a world of perfect data quality the reality is that most organisations will be dealing with data quality defects on a daily basis. I’ve noticed a wide variation in the way organisations manage the life cycle of defects and nowhere is that more apparent in […]

Post a Comment

The Chicken Man versus the Data Scientist

In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze […]

Post a Comment

Re-thinking the design choices of application data quality

If we look at how most data quality initiatives start, they tend to follow a fairly common pattern: Data quality defects are observed by the business or technical community Business case for improvement is established Remedial improvements implemented Long-term monitoring and prevention recommended Move on to the next data landscape […]

Post a Comment

Sisyphus didn’t need a fitness tracker

In his pithy style, Seth Godin’s recent blog post Analytics without action said more in 32 words than most posts say in 320 words or most white papers say in 3200 words. (For those counting along, my opening sentence alone used 32 words). Godin’s blog post, in its entirety, stated: “Don’t measure […]

Post a Comment

Lack of knowledge and the root-cause myth

A lot of data quality projects kick off in the quest for root-cause discovery. Sometimes they’ll get lucky and find a coding error or some data entry ‘finger flubs’ that are the culprit. Of course, data quality tools can help a great deal in speeding up this process by automating […]

Post a Comment