Machine learning best practices: combining lots of models


This is the third post in my series of machine learning techniques and best practices. If you missed the earlier posts, read the first one now, or review the whole machine learning best practices series.  

Data scientists commonly use machine learning algorithms, such as gradient boosting and decision forests, that automatically build lots of models for you. The individual models are then combined to form a potentially stronger solution. One of the most accurate machine learning classifiers is gradient boosting trees.  In my own supervised learning efforts, I almost always try each of these models as challengers.

When using random forest, be careful not to set the tree depth too shallow. The goal of decision forests is to grow at random many large, deep trees (think forests, not bushes). Deep trees certainly tend to overfit the data and not generalize well, but a combination of these will capture the nuances of the space in a generalized fashion. 

Some algorithms fit better than others within specific regions or boundaries of the data.  A best practice is to combine different modeling algorithms.  You may also want to place more emphasis or weight on the modeling method that has the overall best classification or fit on the validation data. Sometimes two weak classifiers can do a better job than one strong classifier in specific spaces of your training data.

Figure 3.  An ensemble model that combines a decision tree, support vector machine and neural network, either weighted or unweighted.


As you become experienced with machine learning and master more techniques, you’ll find yourself continuing to address rare event modeling problems by combining techniques.

Recently, one of my colleagues developed a model to identify unlicensed money service businesses. The event level was about 0.09%. To solve the problem, he used multiple techniques:

  • First, he developed k-fold samples by randomly selecting a subsample of nonevents in each of his 200 folds, while making sure he kept all the events in each fold.
  • He then built a random forest model in each fold.
  • Lastly, he ensembled the 200 random forest, which ended up being the best classifier among all the models he developed.

This is a pretty big computational problem so it's important to be able to build the models in parallel across several data nodes so that the models  train quickly.

Figure 4.  Build multiple base classifiers using subsamples for a rare events problem. Combine the base classifiers later.


If there are other tips you want me to cover, or if you have tips of your own to share, leave a comment on this post.

My next post will be about model deployment, and you can click the image below to read all 10 machine learning best practices.


About Author

Wayne Thompson

Manager Data Science Technologies

Wayne Thompson, Chief Data Scientist at SAS, is a globally renowned presenter, teacher, practitioner and innovator in the fields of data mining and machine learning. He has worked alongside the world's biggest and most challenging organizations to help them harness analytics to build high performing organizations. Over the course of his 24 year tenure at SAS, Wayne has been credited with bringing to market landmark SAS analytics technologies, including SAS Text Miner, SAS Credit Scoring for Enterprise Miner, SAS Model Manager, SAS Rapid Predictive Modeler, SAS Visual Statistics and more. His current focus initiatives include easy to use self-service data mining tools along with deep learning and cognitive computing tool kits.

Related Posts

Back to Top