Recently, as a result of the EU’s General Data Protection Regulation (GDPR) and other regulations, new governance requirements for data management have emerged. These have had some interesting effects on the data preparation process. This post is the third in a series on data preparation (Data preparation in the Analytics Life Cycle, Trends in Data Preparation (Links)) looking at recent changes in the field and how these have affected practice. It also draws out some important lessons about the place of data preparation in the analytics life cycle.
Data must be governed – and that covers data preparation
This is a very important point. Data preparation may be a relatively new area for many businesses, particularly as a separate entity. However, data preparation processes and practices must still comply with the organisation’s data governance processes and rules. This is also the case for data integration and data management solutions: all data-related processes must fit within the organisation’s overall data governance solution.
Why does this matter? First, because a big part of analytics is that different user groups work together across the analytics life cycle, including IT, data scientists and business analysts. They need to be operating with the same data and using the same principles, or the outcome of the analytical modelling is likely to be at best ambiguous and at worst downright wrong.
This collaboration must follow – and be driven by – data governance principles. In other words, data governance is a key part of the process and should be used to enable better cooperation and joint working. It certainly should not be seen as a hindrance to be overcome or circumvented by any means possible.
Governance can facilitate glossaries and deliver more transparency. In practice, this means that nontechnical business users who do not work with the data every day can still be self-sufficient and get all the information they need by serving themselves without worrying about data quality. The organisation can also be confident that users are all drawing on high-quality data, and that the data is being used appropriately and in line with legal or ethical requirements.
Self-service and data preparation
Modern data preparation tools must therefore work closely with data governance functions to accelerate self-service. Self-service analytics can only function alongside self-service data preparation. It is unfortunate but true that business users given access to self-service analytics, but without good quality data, will simply pull in the data they need, from whatever source they can – and then assume that the outcome will still be good. The analytics life cycle will only really work when we have self-service everywhere.
There are, therefore, two key messages about data preparation in the analytical life cycle.
Perhaps the most important thing is to understand that the analytics life cycle is an integrated process. There are various user groups that are active in this process, and also various tools that operate in different phases of the life cycle. Harmonious collaboration and ease of transition from one phase to another is very important.
I think an integrated analytics platform, covering both analytics and data preparation – and here I mean processes that ensure data quality, data integration and data governance – facilitates the entire analytics life cycle. This is a very important point. Customers who want to accelerate the analytics process are particularly well-served with an integrated platform.
The second important point is the central role of data governance. In my experience, governance is a vital support for self-service in the analytics life cycle. It is very important that users are self-reliant and can obtain the knowledge they need, for example, by using a glossary, or via metadata management, about the data that they need to use and in the right context. Therefore, governance is an essential part of the analytical life cycle.
For more information, please register for our webinar discussing Data Preparation in the Analytics Life Cycle.