Tag: data quality

David Loshin 0
What is reference data harmonization?

A few weeks back I noted that one of the objectives on an inventory process for reference data was data harmonization, which meant determining when two reference sets refer to the same conceptual domain and harmonizing the contents into a conformed standard domain. Conceptually it sounds relatively straightforward, but as

Jim Harris 0
Errors, lies, and big data

My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is. Sometimes this fact appears to

Jim Harris 0
The Chicken Man versus the Data Scientist

In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze

Jim Harris 0
Sisyphus didn’t need a fitness tracker

In his pithy style, Seth Godin’s recent blog post Analytics without action said more in 32 words than most posts say in 320 words or most white papers say in 3200 words. (For those counting along, my opening sentence alone used 32 words). Godin’s blog post, in its entirety, stated: “Don’t measure

Dylan Jones 0
Lack of knowledge and the root-cause myth

A lot of data quality projects kick off in the quest for root-cause discovery. Sometimes they’ll get lucky and find a coding error or some data entry ‘finger flubs’ that are the culprit. Of course, data quality tools can help a great deal in speeding up this process by automating

Jim Harris 0
Data science versus narrative psychology

My previous post explained how confirmation bias can prevent you from behaving like the natural data scientist you like to imagine you are by driving your decision making toward data that confirms your existing beliefs. This post tells the story of another cognitive bias that works against data science. Consider the following scenario: Company-wide

Jim Harris 0
Can data change an already made up mind?

Nowadays we hear a lot about how important it is that we are data-driven in our decision-making. We also hear a lot of criticism aimed at those that are driven more by intuition than data. Like most things in life, however, there’s a big difference between theory and practice. It’s

Jim Harris 0
Bring the noise, boost the signal

Many people, myself included, occasionally complain about how noisy big data has made our world. While it is true that big data does broadcast more signal, not just more noise, we are not always able to tell the difference. Sometimes what sounds like meaningless background static is actually a big insight. Other times

Jim Harris 0
The data that supported the decision

Data-driven journalism has driven some of my recent posts. I blogged about turning anecdote into data and how being data-driven means being question-driven. The latter noted the similarity between interviewing people and interviewing data. In this post I want to examine interviewing people about data, especially the data used by people to drive

Jim Harris 0
Being data-driven means being question-driven

At the Journalism Interactive 2014 conference, Derek Willis spoke about interviewing data, his advice for becoming a data-driven journalist. “The bulk of the skills involved in interviewing people and interviewing data are actually pretty similar,” Willis explained. “We want to get to know it a little bit. We want to figure

Jim Harris 0
Survey says sampling still sensible

In my previous post, I discussed sampling error (i.e., when a randomly chosen sample doesn’t reflect the underlying population, aka margin of error) and sampling bias (i.e., when the sample isn’t randomly chosen at all), both of which big data advocates often claim can, and should, be overcome by using all the data. In this

Jim Harris 0
What we find in found data

In his recent Financial Times article, Tim Harford explained the big data that interests many companies is what we might call found data – the digital exhaust from our web searches, our status updates on social networks, our credit card purchases and our mobile devices pinging the nearest cellular or WiFi network.

Jim Harris 0
The dark side of the mood

As an unabashed lover of data, I am thrilled to be living and working in our increasingly data-constructed world. One new type of data analysis eliciting strong emotional reactions these days is the sentiment analysis of the directly digitized feedback from customers provided via their online reviews, emails, voicemails, text messages and social networking

Jim Harris 0
Lean against bias for accurate analytics

We sometimes describe the potential of big data analytics as letting the data tell its story, casting the data scientist as storyteller. While the journalist has long been a newscaster, in recent years the term data-driven journalism has been adopted to describe the process of using big data analytics to

Jim Harris 0
Big data hubris

While big data is rife with potential, as Larry Greenemeier explained in his recent Scientific American blog post Why Big Data Isn’t Necessarily Better Data, context is often lacking when data is pulled from disparate sources, leading to questionable conclusions. His blog post examined the difficulties that Google Flu Trends

Jim Harris 0
What magic teaches us about data science

Teller, the normally silent half of the magician duo Penn & Teller, revealed some of magic’s secrets in a Smithsonian Magazine article about how magicians manipulate the human mind. Given the big data-fueled potential of data science to manipulate our decision-making, we should listen to what Teller has to tell

Jim Harris 0
What Mozart for Babies teaches us about data science

Were you a mother who listened to classical music during your pregnancy, or a parent who played classical music in your newborn baby’s nursery because you heard it stimulates creativity and improves intelligence? If so, do you know where this “classical music makes you smarter” idea came from? In 1993, a

Matthew Magne 0
MDM Foundations: Adding data governance to get to MDM

In my previous post, I outlined the main components needed for a phased approach to MDM. Now, let's talk about some of the other issues around approaching MDM: data governance and the move to enterprise MDM. Where does governance come in? Throughout your MDM program, it's important that deep expertise

Jim Harris 0
Behavioral data quality

For decades, data quality experts have been telling us poor quality is bad for our data, bad for our decisions, bad for our business and just plain all around bad, bad, bad – did I already mention it’s bad? So why does poor data quality continue to exist and persist?

1 5 6 7 8