There is a saying in business that you can have any two out of good, fast and cheap. All three cannot be done, or at least only in an ideal world. There is therefore a strategic trade-off between the three, with a recognition that every business has a different balance point.
I think a similar triangle is emerging in analytics, involving agility, quality and reliability. The question is whether there is a similar trade-off, or if it is possible to have all three. And if there is a trade-off, is the decision a ‘once and for all’ one, or can it alter in different situations? The weights in the triangle might not always be evenly distributed, at some point in time agility might have a higher weight whereas later the weight for reliability increases which makes me rethink my decision.
Defining the terms
Agility often seems to be the ‘holy grail’ of analytics; the ability to move rapidly and flexibly in whatever direction is required. In practice, it means being able to scale up or down on demand, and adopt different options flexibly in different situations. It can also, however, mean a lack of control. If users are allowed to adopt new solutions that work for them, there can be issues with interoperability and reliability later. Agility can also be the enemy of quality: if things are done too quickly, are they done right? Quick results may be based on data that have not been checked for quality, or the content of data varies over time with a big impact on validity.
Reliability, by contrast, often seems to be the villain of the piece in the analytics world. It is everything but agile; slow, secure, and safe. But safe is not always a bad thing, and neither is reliability. Good controls on adoption of technology and reliable data governance, may lead to better quality in the longer term. Rules are often there for a reason, and data quality may depend on them. But in some situations, the value of data lies in freshness. Data become less useful over time, and reliability, too, can therefore be the enemy of quality.
Quality, in this context, is the value that emerges from analytics processes. This value can depend on different quality aspects: It is highly susceptible to the ‘garbage in, garbage out’ rule: in other words, if your data quality is poor, so will be any decisions that are based on that data. The choice of the model and its tuning is another factor that has a big influence on the value. It might be possible to get a better result by tuning but there is a cost associated with it. At the end of the day the value should transfer to an amount of money. But at what point is data quality good enough and when do I know that further tuning of the model would not pay off?
A trade-off between agility and reliability?
There is, therefore, clearly a trade-off between agility and reliability, but is quality always at the heart of it? It is hard to have agility while maintaining reliability. Different situations will call for different measures. Sometimes agility will be essential: fast decisions will be needed, and ‘quick and clean enough’ will be better than ‘slower but perfect’. Sometimes, however, speed will result in poor quality decisions.
The issue here is to ensure ‘clean (or good) enough’, and that is where the issue of quality comes in—or rather, the required level of quality, which must be defined for the situation. It is also important to recognize the value of ‘sometimes’. There is no ‘one size fits all’, even in a single organisation. Sometimes speed will be more important, and sometimes reliability and the weighting may vary over time.
This debate is coalescing around the issue of citizen data scientists, and self-service analytics. Self-service analytics packages have meant that business users are now able to do some or all of their own analysis. Supported by IT in the background, they can take their data and manipulate it to generate their own insights. They have become citizen data scientists. A good citizen is responsible for the results created and thus must always balance the triangle in order to get the most out of it.
The effect of self-service on the analytics triangle
This trend towards self-service has resulted in a big improvement in agility. Business users no longer have to wait for IT and data scientists to be ready to help them with analysis, and can just get on and do it themselves. But what about reliability? That’s where it gets interesting. In quite a number of organisations, IT and data scientist teams have used the extra time available to them as a result of business users doing their own analytics to work on data governance. This is the rules around data management, and improves the way in which data is cleaned and managed. Done correctly, data governance can speed up data cleaning and make data more accessible and easier to use, as well as more accurate.
In other words, the rise of citizen data scientists has been able to improve both reliability and agility. And in doing so, it has improved quality. Perhaps there is still a trade-off between the three, and a balance point to be found. But it is also clear that all three can get better at the same time, and that citizen data scientists are enabling this change.