In the past, we've always protected our data to create an integrated environment for reporting and analytics. And we tried to protect people from themselves when using and accessing data, which sometimes could have been considered a bottleneck in the process. We instituted guidelines and procedures around:
- Certification of the data for enterprise reports.
- Specific areas in the data warehouse where data could be accessed.
- Security on the layers to protect the data.
- Quality of the data (when possible).
- Ensured integrity of the data.
With the new order of data – and with enterprises using technologies like Hadoop – where does the data strategy change? Or does it?
The quest for good quality, integrated data has not gone away. It's just that if we do not thoughtfully load our new technologies with data, we could end up back in the same boat we were in at the beginning of our BI endeavors. With unrelated composted data that may not relate to other data very elegantly.
In the insurance industry, for example, most companies have multiple claim engines. These engines do NOT speak the same language, nor does the data map well between the engines. Integration is of utmost importance to create a complete view of the claim data. If we load this data without giving thought to how it will be used, data could be reported incorrectly. Claims are money, and money has to be correct in our corporation. Hence, thoughtful access to the data via our new technologies. Consider loading the new technology from a trusted and integrated source (i.e., data warehouse or operational data store). Then allow the new users access to the new store of data. It's still important to implement procedures and guidelines surrounding the reporting of this data to internal and external users.