Data Management

Blend, cleanse and prepare data for analytics, reporting or data modernization efforts

Data Management
Jim Harris 0
Pushing data quality beyond boundaries

Throughout my long career of building and implementing data quality processes, I've consistently been told that data quality could not be implemented within data sources, because doing so would disrupt production systems. Therefore, source data was often copied to a central location – a staging area – where it was cleansed, transformed, unduplicated, restructured

Analytics | Customer Intelligence | Data Management
Marcelo Sukni 0
El futuro de la analítica está en manos del científico de datos

Analistas y expertos en Big Data de todo el mundo coinciden en la importancia de potenciar el capital humano y desarrollar profesionales más preparados. Cada persona tiene aptitudes para realizar diversas actividades como natación, equitación, tenis o, incluso, destacar en el ámbito profesional ofreciendo mejores resultados en tareas determinadas. Ahora

Data Management
Bill Davis 0
MapReduce vs. Apache Spark vs. SQL: Your questions answered here and at #StrataHadoop

As the big data era continues to evolve, Hadoop remains the workhorse for distributed computing environments. MapReduce has been the dominant workload in Hadoop, but Spark -- due to its superior in-memory performance -- is seeing rapid acceptance and growing adoption. As the Hadoop ecosystem matures, users need the flexibility to use either traditional MapReduce

Data Management
David Loshin 0
Big data quality with continuations

I've been doing some investigation into Apache Spark, and I'm particularly intrigued by the concept of the resilient distributed dataset, or RDD. According to the Apache Spark website, an RDD is “a fault-tolerant collection of elements that can be operated on in parallel.” Two aspects of the RDD are particularly

1 21 22 23 24 25 33

Back to Top