Machine learning seems to be the new hot topic these days. Everybody's talking about how machines are beating human players in chess, Jeopardy, and now even Go. In the future, artificial intelligence will drive our cars and our jobs will be taken over by robots. There’s a lot of hype, a lot of fear and uncertainty -- as there often is when new technology has the potential to disrupt our societies.
However, when you talk to the people developing these new types of intelligent algorithms, you get a quite different picture. Today, there’s a lot of manual work involved in automating decision processes. The development of algorithms that can make decisions in a “weak” intelligent way is hard work.
I call them “weak” intelligent algorithms, since we’ve only developed algorithms that can do one thing. They might be able to do this one thing extraordinarily well, like playing Go or playing chess. But if you ask the algorithm that can play Go to drive your car, it will fail. So, we’re still a long way from developing the highly intelligent machine.
Cumbersome trial-and-error approach
What we can do, though, is apply algorithms to almost any kind of digital data to extract information automatically and make decisions in a seemingly intelligent way. The development of these algorithms, called machine learning, can be a cumbersome journey.
The usual approach is to apply trial-and-error methods to find the optimal algorithms for the problem at hand. Usually, a data scientist will choose algorithms based on practical experience and personal preferences. That's okay, because usually there’s no unique and relevant solution to create a machine learning model. Many algorithms have been developed to automate manual and tedious steps of the machine learning pipeline – for example, to loosen prerequisites under which machine learning theories and approaches apply, to create input features automatically and select the best predictors, to test different modeling algorithms and choose the best model. But still, it requires a lot of lab work to build a machine learning model with trustworthy results.
A big chunk of this manual work relates to finding the optimal set of hyperparameters for a chosen modelling algorithm. Hyperparameters are the parameters that define the model applied to a data set for automated information extraction.
For example, if I decide to build a machine learning model to predict which customer is a good credit risk, I need to make many decisions during the training process. I need to choose which modeling approaches to test, which data I choose to train the model, which data to test the results, how to tune the parameters of the chosen model and how to validate the results.
All these choices will impact the outcome of my model building exercise, and eventually the final model selected. Since this model will be used to decide which customers will get credit, it’s vital that we have high confidence in the model to make decisions we can trust.
A large portion of the model building process – besides the analytical data preparation that still takes the lion’s share of the time – is taken up by experiments to identify the optimal set of parameters for the model algorithm. Here we quickly get into the curse of dimensionality. Modern machine learning algorithms have a large number of parameters that need to be tuned during the model training process. There’s also a trend to develop more and more complex algorithms that can automatically drill deeper into the data to find more subtle patterns.
For example, we’re seeing a development from shallow neural networks to deep neural networks, from simple decision trees to random forests and gradient boosting algorithms. While these algorithms improve the chances to build accurate, stable predictive models for more complex business problems (such as fraud detection, image processing, speech recognition, cognitive computing), they also require a much larger number of parameters to be tuned during training. So, if I have 10 parameters that need to be tuned to an optimal setting and each parameter can have 10 different values (these are very conservative numbers), I end up with combinations to test as many as 100 parameter-value pairs and 10^10 combinations. And this only applies to a single modeling approach. If I’d like to test different algorithms this number grows very quickly.
Speedy autotuning approach
So, what can we do? There are several ways to support the data scientist in this cumbersome lab work of tuning machine learning model parameters. These approaches are called hyperparameter optimization.
In general, there are three different types: parameter sweep, random search and parameter optimization.
- Parameter sweep: This is an exhaustive search through a pre-defined set of parameter values. The data scientist selects the candidates of values for each parameter to tune, trains a model with each possible combination and selects the best-performing model. Here, the outcome very much depends on the experience and selection of the data scientist.
- Random search: This is a search through a set of randomly selected sets of values for the model parameters. With modern computers, this can provide a less biased approach to finding an optimal set of parameters for the selected model. But since this is a random search it's possible to miss the optimal set unless a sufficient number of experiments are conducted, which can be expensive.
- Parameter optimization: Again there are different approaches, but they all apply modern optimization techniques to find the optimal solution. In my opinion, this is the best way to find the most appropriate set of parameters for any predictive model, and any business problem, in the least expensive way. It's the “optimal solution,” so to speak.
SAS has conducted lots of research into hyperparameter tuning -- we call it autotuning. It’s now possible to quickly and easily find the optimal parameter settings for diverse machine learning algorithms such as decision trees, random forests, gradient boosting, neural networks, support vector machines and factorization machines by simply selecting the option you want. In the background there are complex local search optimization routines hard at work that tune the models efficiently and effectively. I’m convinced this new capability will be a great help to the modern data scientist. They'll find the best model much more quickly and with more confidence. For the business, this means getting value out of machine learning much faster.
Find out more about SAS’ unique patent-pending solution to hyperparameter optimization: SAS® Visual Data Mining and Machine Learning. And you can also read this paper to find out more about autotuning.