Author

Jim Harris
RSS
Blogger-in-Chief at Obsessive-Compulsive Data Quality (OCDQ)

Jim Harris is a recognized data quality thought leader with 25 years of enterprise data management industry experience. Jim is an independent consultant, speaker, and freelance writer. Jim is the Blogger-in-Chief at Obsessive-Compulsive Data Quality, an independent blog offering a vendor-neutral perspective on data quality and its related disciplines, including data governance, master data management, and business intelligence.

Jim Harris 0
Bring the noise, boost the signal

Many people, myself included, occasionally complain about how noisy big data has made our world. While it is true that big data does broadcast more signal, not just more noise, we are not always able to tell the difference. Sometimes what sounds like meaningless background static is actually a big insight. Other times

Jim Harris 0
The ethics of algorithmic regulation

In my last three posts on data ethics, I explored a few of the ethical dilemmas in our data-driven world. From examining the ethical practices of free internet service providers to the problem of high-frequency trading, I’ve come to realize the depth and complexity of these issues. Anyone who's aware of these

Jim Harris 0
The low ethics of high-frequency trading

Imagine if your ability to feed your family depended upon how fast you could run. Imagine the aisles of your grocery store as lanes on a running track. If you can outrun your fellow shoppers, grab food off the shelves and race through the checkout at the finish line, then

Jim Harris 0
Mapping ethics in a data-driven world

In my previous post, I examined ethics in a data-driven world with an example of how Facebook experiments on its users. Acknowledging the conundrum facing users of free services like Facebook, Phil Simon commented that “users and customers aren’t the same thing. Maybe users are there to be, you know... used.” What about when a

Jim Harris 0
Facing ethics in a data-driven world

I have previously blogged about how the dark side of our mood skews the sentiment analysis of customer feedback negatively since we usually only provide feedback when we have a negative experience with a product or service. Reading only negative reviews from its customers could make a company sad, but could reading only

Jim Harris 0
Data science and decision science

Data science, as Deepinder Dhingra recently blogged, “is essentially an intersection of math and technology skills.” Individuals with these skills have been labeled data scientists and organizations are competing to hire them. “But what organizations need,” Dhingra explained, “are individuals who, in addition to math and technology, can bring in

Jim Harris 0
The data that supported the decision

Data-driven journalism has driven some of my recent posts. I blogged about turning anecdote into data and how being data-driven means being question-driven. The latter noted the similarity between interviewing people and interviewing data. In this post I want to examine interviewing people about data, especially the data used by people to drive

Jim Harris 0
Being data-driven means being question-driven

At the Journalism Interactive 2014 conference, Derek Willis spoke about interviewing data, his advice for becoming a data-driven journalist. “The bulk of the skills involved in interviewing people and interviewing data are actually pretty similar,” Willis explained. “We want to get to know it a little bit. We want to figure

Jim Harris 0
The antimatters of MDM (part 5)

In physics, antimatter has the same mass, but opposite charge, of matter. Collisions between matter and antimatter lead to the annihilation of both, the end result of which is a release of energy available to do work. In this blog series, I will use antimatter as a metaphor for a factor

Jim Harris 0
A double take on sampling

My previous post made the point that it’s not a matter of whether it is good for you to use samples, but how good the sample you are using is. The comments on that post raised two different, and valid, perspectives about sampling. These viewpoints reflected two different use cases for data,

Jim Harris 0
Survey says sampling still sensible

In my previous post, I discussed sampling error (i.e., when a randomly chosen sample doesn’t reflect the underlying population, aka margin of error) and sampling bias (i.e., when the sample isn’t randomly chosen at all), both of which big data advocates often claim can, and should, be overcome by using all the data. In this

Jim Harris 0
What we find in found data

In his recent Financial Times article, Tim Harford explained the big data that interests many companies is what we might call found data – the digital exhaust from our web searches, our status updates on social networks, our credit card purchases and our mobile devices pinging the nearest cellular or WiFi network.

Jim Harris 0
The dark side of the mood

As an unabashed lover of data, I am thrilled to be living and working in our increasingly data-constructed world. One new type of data analysis eliciting strong emotional reactions these days is the sentiment analysis of the directly digitized feedback from customers provided via their online reviews, emails, voicemails, text messages and social networking

Jim Harris 0
Lean against bias for accurate analytics

We sometimes describe the potential of big data analytics as letting the data tell its story, casting the data scientist as storyteller. While the journalist has long been a newscaster, in recent years the term data-driven journalism has been adopted to describe the process of using big data analytics to

Jim Harris 0
Big data hubris

While big data is rife with potential, as Larry Greenemeier explained in his recent Scientific American blog post Why Big Data Isn’t Necessarily Better Data, context is often lacking when data is pulled from disparate sources, leading to questionable conclusions. His blog post examined the difficulties that Google Flu Trends

Jim Harris 0
What magic teaches us about data science

Teller, the normally silent half of the magician duo Penn & Teller, revealed some of magic’s secrets in a Smithsonian Magazine article about how magicians manipulate the human mind. Given the big data-fueled potential of data science to manipulate our decision-making, we should listen to what Teller has to tell

Jim Harris 0
What Mozart for Babies teaches us about data science

Were you a mother who listened to classical music during your pregnancy, or a parent who played classical music in your newborn baby’s nursery because you heard it stimulates creativity and improves intelligence? If so, do you know where this “classical music makes you smarter” idea came from? In 1993, a

Data Management
Jim Harris 0
Why can’t we predict the weather?

This is the time of year when we like to make predictions about the upcoming year. Although I am optimistic about the potential of predictive analytics in the era of big data, I am also realistic about the nature of predictability regardless of how much data is used. For example, in

Jim Harris 0
Behavioral data quality

For decades, data quality experts have been telling us poor quality is bad for our data, bad for our decisions, bad for our business and just plain all around bad, bad, bad – did I already mention it’s bad? So why does poor data quality continue to exist and persist?

Jim Harris 0
The four noble truths of data quality

Loraine Lawson recently used the Eight-Fold Path of Buddhism, in which practitioners are encouraged to pursue right views, intentions, speech, actions, livelihood, efforts, mindfulness and concentration, as inspiration for her blog post The Five-Fold Path for Ensuring Data = Information. The post offered five recommendations for ensuring that data is transformed into

Jim Harris 0
Preventing the zombie data-pocalypse

Since tomorrow is How-long-has-it-been-since-you-used-this-data-ween, it’s time to review your organization’s preparedness for preventing the zombie data-pocalypse. (Please Note: This should not be confused with your organization’s preparedness for preventing the zombie apocalypse, for which check out the resources provided by the Centers for Disease Control and Prevention by ever-so-carefully clicking on

Jim Harris 0
The architects of the invisible

In the era of big data, Kenneth Cukier and Viktor Mayer-Schonberger noted in their book Big Data: A Revolution That Will Transform How We Live, Work, and Think, “we are in the midst of a great infrastructure project that in some ways rivals those of the past, from the Roman aqueducts

Jim Harris 0
The antimatters of MDM (part 4)

In physics, antimatter has the same mass, but opposite charge, of matter. Collisions between matter and antimatter lead to the annihilation of both, the end result of which is a release of energy available to do work. In this blog series, I will use antimatter as a metaphor for a

Jim Harris 0
The antimatters of MDM (part 3)

In physics, antimatter has the same mass, but opposite charge, of matter. Collisions between matter and antimatter lead to the annihilation of both, the end result of which is a release of energy available to do work. In this blog series, I will use antimatter as a metaphor for a

Jim Harris 0
The antimatters of MDM (part 2)

In physics, antimatter has the same mass, but opposite charge, of matter. Collisions between matter and antimatter lead to the annihilation of both, the end result of which is a release of energy available to do work. In this blog series, I will use antimatter as a metaphor for a

Jim Harris 0
The antimatters of MDM (part 1)

In physics, antimatter has the same mass, but opposite charge, of matter. Collisions between matter and antimatter lead to the annihilation of both, the end result of which is a release of energy available to do work. In this blog series, I will use antimatter as a metaphor for a