If your enterprise is working with Hadoop, MongoDB or other nontraditional databases, then you need to evaluate your data strategy. A data strategy must adapt to current data trends based on business requirements. So am I still the clean-up woman? The answer is YES!
I still work on the quality of the data. Only now, instead of just for the data warehouse, we interpret and analyze output from our data profiling tools, and we reflect changes to the enterprise source systems. (Or we should, anyway.)
For the most part, all the data strategy work we have completed in the past is still pertinent. What we must understand is that today's data is absorbed other ways, and much faster than with our traditional data warehousing platforms. That said, data preparation and integration must happen very quickly. Latency in the data may not be tolerated in these new analytic platforms.
Self-service data access typically happens using nontraditional databases (i.e., Hadoop). Analysis cannot wait for IT to land the data and prepare the data for analysis. It's needed sooner! The teams using this technology are looking for all enterprise data (including data from the data warehouse) as fast as they can get it. In some cases, they're abstracting data that's sent between processes, and interpreting results prior to the end of the process. This, in itself, could change how fast we can do business.
We've always wanted to address data quality issues as close to the source as possible. With today’s technology, we'll be able to absorb and interpret quality and integrity of the data faster than ever before.
So, cleaning up and profiling data after the process is still required. But our new data strategy must reflect changes back to the source systems, ensuring better quality data for the future. New platforms help to make this possible.
Got 2 minutes? Watch Data Preparation for Analytics in the Age of Big Data.