Part 1 of this two-part series suggested a definition of data-driven design and described how technology is driving a change in mindset about how data is captured, prepared, integrated and used. In Part 2, we'll concentrate on lessons learned and things that could go wrong as you embark upon data-driven design.
You've probably seen articles about business users doing their own data preparation for an analytics project. (I'm pretty sure we wrote a few of those articles!) Some business users have skill sets that give them the luxury of pairing together data from multiple sources and creating what they need quickly. And their work may be funded by projects that require the team to do the data work without (sometimes) including IT.
Drawing from real-life lessons learned by a few of my clients, let's look at some things that could go wrong as you strive to be more data-driven.
Data redundancy
With no governance and control on where data is moved, you could end up with the same data in different data stores across the enterprise – which could be used incorrectly. It is important that we suggest using enterprise data assets, and meet the needs of the users without moving data around. For example, if you are creating a data lake, we might suggest that we look there first. Most data lake implementations are getting data feeds from our enterprise governed data assets. Usually, some sort of view may be created, based on our business users data requirements. The business user may want streamed data from a portal, as well as, strategic data for a join to analyze the speed that an inquiry is completed.
Data overload
I'm thankful that we have software to profile and inventory data as we bring it into our big data platforms. This allows us the insight required to determine how to use the data. Without it, we have no way of knowing what shape the data is in, where it came from, or how to assess its quality and integrity.
No governance
There may be some early-on data that was loaded onto our big data platform that did not go through the profiling/inventory software. This data needs to be identified and processed correctly. Why did this happen? Because at the time, we weren't quite ready to govern this data – or we didn't have data governance software to help us. This sort of thing happens – we just need to remember to fix it.
No business rules on data usage
We're living in a corporate world where business rules are not always documented, and may only live in the mind of someone at the organization. That said, the data misuse will occur if there is not an enterprise understanding of how the data is used. Profiling and doing inventory of the data will help to a certain extent, but it won't alleviate the misuse of data. Financial controls and balances will still be required for any data pertaining to dollars that flow through our big data platform. Without these controls, you risk reporting incorrect regulatory data.
Business users – get the data quickly
Most of my clients are using a big data platform for analysis but are continuing to use enterprise strategic data assets for corporate reporting. I believe this is a good way to get data to business users faster without compromising any regulatory requirements. Our enterprise data assets are governed – and hopefully managed and audited – for completeness, viability, regulatory standards, etc. Relying on these assets allows us to get the data to business users faster, while continuing to do corporate reporting that usually does not require real-time data.