In my last post, I talked about how data still needs to be cleaned up – and data strategy still needs to be re-evaluated – as we start to work with nontraditional databases and other new technologies.
There are lots of ways to use these new platforms (like Hadoop). For example, many organization are using a data lake as a staging area for minimally processed data. That said, this may be the place used to analyze and profile data. Not only for quality of data, but for integrity and completeness of the data between processes. Kind of like an audit area. For example, maybe we made a change to configuration of a source system, and we want to make sure the data coming out of that system is processed according to the testing and/or use scenarios.We could store data in many formats in the data lake. It could be XML or structured data. Technology is changing so quickly; it has become difficult to create a strategic vision.
We could also use the data lake as an archival platform. Heck, they do say it's cheaper – right?
If we do start loading up the data lake, information about the data residing in the lake will become very important. I would still call that "metadata," wouldn’t you? So, how the data is used and who is using the data are still very important questions.
To manage this environment will still require guidelines, governance or whatever you want to call it. We cannot just dump stuff in the lake, give everyone a fishin’ pole and say HAVE AT IT!
Issues like this arise: Do I still need tool platforms that include a combination of data integration and data quality? Must MDM still be considered?
In our fast-moving technological world, getting data to the right people, in the right amount of time, with the right amount of clean-up will be key to business prosperity and expansion. New platforms and self-service are just the tip of the iceberg.
Download a paper about data management for analytics best practices