The Data Roundtable
A community of data management experts
The data lake is a great place to take a swim, but is the water clean? My colleague, Matthew Magne, compared big data to the Fire Swamp from The Princess Bride, and it can seem that foreboding. The questions we need to ask are: How was the data transformed and
In my last two posts, we concluded two things. First, because of the need for broadcasting data across the internal network to enable the complete execution of a JOIN query in Hadoop, there is a potential for performance degradation for JOINs on top of files distributed using HDFS. Second, there are
In my previous post, I talked about how a bank realized that data quality was central to some very basic elements of its initiatives, such as know your customer (KYC), customer on-boarding and others. In this blog, let’s explore what this organization did to foster an environment of data quality