Let us be smarter with the Internet of Things

As we enter the era of “everything connected,” we cannot forget that gathering data is not enough. We need to process that data to gain new knowledge and build our competitive advantage. The Internet of Things is not just a consumer thing – it also makes our businesses more intelligent. Whenever […]

Post a Comment

Data management for analysis – Feeding the analytical monster more than once

(Otherwise known as Truncate – Load – Analyze – Repeat!) After you’ve prepared data for analysis and then analyzed it, how do you complete this process again?  And again? And again? Most analytical applications are created to truncate the prior data, load new data for analysis, analyze it and repeat […]

Post a Comment

Using Hadoop: Query optimization

In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed […]

Post a Comment

A few New Year’s data resolutions

Since now is the time when we reflect on the past year and make resolutions for next year, in this post I reflect on my Data Roundtable posts from the past year and use them to offer a few New Year’s data resolutions for you and your organization to consider in […]

Post a Comment

Big data versus the not-so-humble opinion

Henrik Liliendahl Sørensen recently blogged about the times when a HiPPO (Highest Paid Person’s Opinion) outweighs data in business decision-making. While I have seen plenty of hefty opinions trump high-quality data, those opinions did not always come from the highest paid person. The stubborn truth is that we all hold our […]

Post a Comment

Big data and omission neglect

In my previous post, I used the book Mastermind: How to Think Like Sherlock Holmes by Maria Konnikova to explain how additional information can make us overconfident even when it doesn’t add to our knowledge in a significant way. Knowing this can help us determine how much data our decisions need to be driven […]

Post a Comment

Big data and the treadmill of overconfidence

In her book Mastermind: How to Think Like Sherlock Holmes, Maria Konnikova discussed four sets of circumstances that tend to make us overconfident: Familiarity — When we are dealing with familiar tasks, we feel somehow safer, thinking that we don't have the same need for caution as we would when trying something […]

Post a Comment

As the butter churns in Bangladesh

“Correlation does not imply causation” is a saying commonly heard in science and statistics emphasizing that a correlation between two variables does not necessarily imply that one variable causes the other. One example of this is the relationship between rain and umbrellas. People buy more umbrellas when it rains. This […]

Post a Comment

Errors, lies, and big data

My previous post pondered the term disestimation, coined by Charles Seife in his book Proofiness: How You’re Being Fooled by the Numbers to warn us about understating or ignoring the uncertainties surrounding a number, mistaking it for a fact instead of the error-prone estimate that it really is. Sometimes this fact appears to […]

Post a Comment

The Chicken Man versus the Data Scientist

In my previous post Sisyphus didn’t need a fitness tracker, I recommended that you only collect, measure and analyze big data if it helps you make a better decision or change your actions. Unfortunately, it’s difficult to know ahead of time which data will meet that criteria. We often, therefore, collect, measure and analyze […]

Post a Comment