Which comes first, data quality or data analytics?

2

chicken peeking out of eggWhile it’s obvious that chickens hatch from eggs that were laid by other chickens, what’s less obvious is which came first – the chicken or the egg? This classic conundrum has long puzzled non-scientists and scientists alike. There are almost as many people on Team Chicken as there are on Team Egg, meaning there are almost as many people who believe the chicken came first as there are people who believe the egg came first.

It turns out, however, the yolks on Team Chicken since the answer is ... the egg came first.

This YouTube video does a great job of explaining why, but here are the basics. Chickens, like all species, came to be chickens through the long, slow process of evolution (i.e., gradual changes in DNA over long periods of time). In the reproductive process of animals like chickens, male and female DNA combine to form a zygote, the first cell of a new offspring that divides to create all the cells of a complete animal, with every cell containing exactly the same DNA, all of which came from the zygote. Chickens evolved from chicken-like birds over time through gradual changes (aka mutations) in DNA that created a new zygote capable of producing the first chicken. Since the zygote is the only place where DNA mutations could produce a new animal, and the first chicken zygote was housed inside the egg laid by a chicken-like bird, the egg must have come before the chicken (i.e., the egg came first).

How does this fowl dilemma relate to the pecking order of data management disciples? Well, to me at least, it seems reminiscent of another classic conundrum:

Which comes first, data quality or data analytics?

Historically, there have been a lot more people on Team Quality than Team Analytics, meaning there were many more people who believed data quality comes first than there were people who believed data analytics came first. This makes sense. After all, analytics based on poor quality data can lead to bad business decisions. For example, geographical profiling of customers based on inaccurate postal address data provides a false impression of where the most valuable customers live and can drive bad business decisions such as where to focus marketing efforts. Data scientists, before they can work their statistical, algorithmic and mathematical magic, often cite data preparation – which includes data quality assessment and improvement – as their most time-consuming task.

“The egg of course,” was Richard Dawkins answer to the chicken or the egg question, explaining “the chicken is only an egg’s way of making another egg.” The data management equivalent might be to say data quality of course since analytics is only a way of making more data, which to be valuable has to be of high quality.

It would therefore seem that non-data-scientists and data scientists alike should all be on Team Quality, believing that data quality comes first. Big data, however, is the 800-pound chicken/egg in the room. (Big data is a chicken when viewed as a source producing enormous quantities of data, but big data is an egg when you consider its real value is determined by the quality of the zygote of insight contained within.)

While I am still on Team Quality in general, there are times when big data puts me on Team Analytics. Sometimes analytics must be used to evaluate big data to determine its applicability to specific business problems. Analytics, in this context, acts as an advanced filter – it identifies the most valuable big data before significant resources (time, money, people) are invested in data quality assessment and improvement. Therefore, during this increasingly common scenario, data analytics actually comes first.

What say you?

Are you on Team Quality or Team Analytics? Share your perspective on the relationship and prioritization between data quality and data analytics, and the impacts big data has had on this, by posting a comment below.

Big data quality footer image

Share

About Author

Jim Harris

Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Related Posts

2 Comments

  1. Bhaskar lakshmikanth on

    You are absolutely right. Data Quality comes first and is very critical for either using the data for some analysis or to use purely for analytics.

Leave A Reply

Back to Top