Analytics models, that are at the heart of artificial intelligence, have taken on increasing importance in businesses. They are a vital support to decision-making across organisations and networks, and are relied on more and more by executives and teams. However, models are only as good as their inputs, and that includes the process that creates them. Get those wrong, and the model will provide the ‘wrong’ answers to your questions. Being data-driven in the wrong direction is likely to be worse than going on gut instinct.
We all know about the importance of choosing the right data, and making sure the quality is high. Fewer people, however, consider the process of developing models, but it is also an important input. And like data quality, poor quality process can result in problems further down the line.
Timing and capability
There are a number of issues to consider in the process of building an analytic model. The first is timing, and it is linked to data quality. Data is at its best when it is fresh. Taking several months or more to build a model therefore means that the data that underpins it is likely to be out of date. If a model is built faster, then the supporting data is fresher, and, of course, the model can start to generate value for the organisation sooner. But this, of course, must be balanced against taking time to ensure that the model works.
The second issue is capability. Despite the rise in citizen data scientists, most companies still have a shortage of people who can build robust analytics models, especially those involving more advanced machine learning techniques. Building models often requires advanced feature engineering, feature selection, and model building/fine tuning, and cannot be easily done without skilled data scientists. What’s more, the number of models that data scientists can build is limited by their workload and time. This issue is becoming more important as the number of analytical models, and their level of granularity, increases. With business users increasingly wanting multiple models across customer segments, data scientists are stretched thinner and thinner, because each new model needs their time and energy.
Sharing and deployment
The final part of the process of developing models is sharing and deployment. The best model in the world is no use if nobody knows of its existence. It is also important to be able to scale rapidly when necessary, and to have a governance structure that permits this to happen easily. Scaling and collaboration therefore enable businesses to test, refine and deploy models faster, and make decisions more rapidly as a result.
For any model to work, it needs to have good, solid inputs, including a reliable and effective development process. After all, garbage in, garbage out.
The search for solutions
Companies are searching for solutions that will allow data scientists to focus on where they, and only they, can add value, and reduce the amount of time they need to spend on ‘grunt work’. Reducing the number of manual steps involved in the model development process can also help to reduce the error rate. One option which is helping is automation, and particularly a rapid model development and production scoring environment, coupled with an improved structure and governance for developing and deploying models.
Automated model development is likely to enable companies to use a broader array of analytical techniques, applying the right methodology for the data. Companies using these techniques are no longer limited by the preferences or capabilities of their data scientists. Their data scientists are also able to focus on really adding value to the organisation, perhaps through more complex predictive analytics.
An open modelling approach, using templated workflows, enables collaboration with a larger team of experts, as well as across IT and business units. It also reduces the number of errors made in moving from development to operation, and in scaling up the model. The opportunity to try out new approaches rapidly, stop those that do not work, and rapidly scale up those that do is another key benefit.Try out new approaches rapidly, stop those that do not work, and rapidly scale up those that do #MachineLearning Click To Tweet
Automated model development can also build in a governance framework to track steps from model initiation to retirement, ensuring that the process is transparent and managed well. Deployment to a wide range of operational systems, including batch, real-time and in-database, offers additional flexibility.
Data-driven in the right direction
All these benefits mean that companies can develop better models faster, and therefore gain value from them sooner by making better decisions faster. Becoming data-driven more quickly is a huge improvement, but only if the direction of travel is right. Automation may offer the answer to this conundrum.
What does scale have to do with it?
Central to machine learning is the idea that with each iteration, an algorithm learns from the data. This training cycle requires more than incorporating the latest platform, architecture or learning algorithms. Robust data pipeline matched to flexible and easy-of-use frameworks are critical. So what should be considered when designing for machine learning at scale?
Why machine learning at scale matters
We hosted a digital panel discussion on Twitter covering this theme. Read the highlights as a Storify: Why machine learning at scale matters