In my last post, I pointed out that an uninformed approach to running queries on top of data stored in Hadoop HDFS may lead to unexpected performance degradation for reporting and analysis. The key issue had to do with JOINs in which all the records in one data set needed
Tag: Big Data Analytics
Using Hadoop: Query optimization
A few New Year’s data resolutions
Since now is the time when we reflect on the past year and make resolutions for next year, in this post I reflect on my Data Roundtable posts from the past year and use them to offer a few New Year’s data resolutions for you and your organization to consider in
As the butter churns in Bangladesh
“Correlation does not imply causation” is a saying commonly heard in science and statistics emphasizing that a correlation between two variables does not necessarily imply that one variable causes the other. One example of this is the relationship between rain and umbrellas. People buy more umbrellas when it rains. This