In my last two posts, we concluded two things. First, because of the need for broadcasting data across the internal network to enable the complete execution of a JOIN query in Hadoop, there is a potential for performance degradation for JOINs on top of files distributed using HDFS. Second, there are
English
Using Hadoop: Emerging options for improved query performance
Relationship status - Connected; Analytics for Agency
“When it comes to the Internet of Things, the future clearly belongs to the Things”. I made this brash statement in a previous post (“Cloud encounters of the Fifth Kind”) referring to machine-to-machine (M2M) being the fastest growing component of non-human traffic on the Web. I say “brash” because that
Showing the ugly face of bad data: Part 2
In my previous post, I talked about how a bank realized that data quality was central to some very basic elements of its initiatives, such as know your customer (KYC), customer on-boarding and others. In this blog, let’s explore what this organization did to foster an environment of data quality