You’ve probably heard of DevOps, but do you know about DataOps? It builds on the DevOps approach to provide huge benefits in unlocking business value from data.
Many people have heard of DevOps, even if they don’t know precisely what it means. It is an agile approach to software development, which brings together the development and testing processes. This is designed to shorten the development life cycle, and improve the quality of the software. However, what do we mean by DataOps?
Like DevOps, DataOps borrows from agile methodology. It provides an agile approach to data access, management, storage, analysis, and governance. Huge amounts of data scientist time are spent on data management, and especially access and preparation. This is essential to ensure high quality data, because we all know that garbage in means garbage out. DataOps offers a better way to operationalise data and analytics workflows, providing reliability, adaptability and perhaps most crucially, faster time to value. It makes data more usable, and also promotes better data governance. Fundamentally, it is all about creating more business value from big data.
DataOps is not quite the same as ModelOps, the other ‘Ops’ phrase that is often mentioned in analytics circles. That is about the management of models, including their governance. ModelOps is therefore about operationalising analytical models, and DataOps is about the data that feeds into those models.
There are several processes and elements that feed into DataOps. These include data storage, including managing the data lake or data warehouse, and data governance. DataOps also uses statistical process control to monitor the data analytics pipeline, improving processing and data quality. We can liken the overall process to the processes required to provide clean water. You start by collecting rainwater in a reservoir, then purify it through various systems and processes. Once it is ready, it flows into the pipes to come out of the tap in someone’s house, ready to use.
The benefits of DataOps
I said before that DataOps could help to unlock value from data, and particularly to do so quicker than manual methods. This is easy to say, but the real question is how this works in practice. Probably the best way is to provide some examples:
- The dataset can be configured to take into account constraints and security. For example, in a dataset on a witness protection programme, the head of the organisation might be able to see each person’s real name. However, those lower down the organisation could only see the fake name, or possibly even just a pseudonym or anonymised data.
- The purposes for data use can be specified by user. This means that each user’s access is tightly controlled, and the control can be time-limited. This improves the security of the data. You can even ensure that the data is right for the end user, rather than the person preparing the report. For example, someone preparing a report for the CEO might be able to access more information, in more detail if necessary, than if they were preparing a report for a more junior manager.
- Data can be held for a sensible amount of time. All data has a use-by date: a date by which it cannot sensibly be used any more because it is likely to be out-of-date. DataOps can enable this date to be specified, so that data is automatically removed after that date. This is useful for many applications, but particularly clinical trials data in life sciences, and personal information of any kind.
- It is easier to democratise analytics with DataOps. DataOps makes sure that data is ready to use, and accessible to those who need it, at the right level. It therefore fits well with a low code/no code environment, where users can build models quickly and easily to answer business questions.
- Data can be transformed automatically. One of the biggest hold-ups in analytics is getting the right data in the right form. DataOps means that you can be confident that all data can be transformed automatically to fit users’ requirements. This again improves data quality and governance.
Effectively, DataOps gives you the equivalent of an online shop for data. Users can access the data that they need, in the required format, over and over again, knowing that it is up-to-date and accurate. Crucially, just like turning on the tap in your house, you as the user do not have to worry about whether you have the right data. All you have to do is to input your requirements.