The Data Roundtable
A community of data management experts
Top 5 data quality mistakes organizations make
.@philsimon lists the gravest data-quality errors.
Big data quality with continuations
I've been doing some investigation into Apache Spark, and I'm particularly intrigued by the concept of the resilient distributed dataset, or RDD. According to the Apache Spark website, an RDD is “a fault-tolerant collection of elements that can be operated on in parallel.” Two aspects of the RDD are particularly
How big of a deal is big data quality?
Data quality has always been relative and variable, meaning data quality is relative to a particular business use and can vary by user. Data of sufficient quality for one business use may be insufficient for other business uses, and data considered good by one user may be considered bad by others.