They say nothing in life is certain other than death and taxes, but there something else I’ve found I can count on from experience: sending out invites for a party on social media only to receive a few affirmative responses and a whole slew of “maybe”. I know my friends
Tag: data science basics
This resource is designed primarily for beginner to intermediate data scientists or analysts who are interested in identifying and applying machine learning algorithms to address the problems of their interest. A typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should
Through hyperparameter autotuning, you can maximize a model's performance without maximizing effort. While SAS searches the hyperparameter space in the background, you are free to pursue other work.
We have updated our software for improved interpretability since this post was written. For the latest on this topic, read our new series on model-agnostic interpretability. As machine learning takes its place in many recent advances in science and technology, the interpretability of machine learning models grows in importance. We
This is the first in a series of posts about machine learning concepts, where we'll cover everything from learning styles to new dimensions in machine learning research. What makes machine learning so successful? The answer lies in the core concept of machine learning: a machine can learn from examples and
This is the final post in my series of machine learning best practices. If you missed the earlier posts, start at the beginning, or read the whole series by clicking on the image to the right. While post four in the series was about combining different types of models, this
This is the seventh post in my series of machine best practices. Catch up by reading the first post or the whole series now. Generalization is the learned model’s ability to fit well to new, unseen data instead of the data it was trained on. Overfitting refers to a model that fits
This is the sixth post in my series of machine learning best practices. If you've come across the series for the first time, you can go back to the beginning or read the whole series. Aristotle was likely one of the first data scientists who studied empiricism by learning through
This is the fifth post in my series of machine learning best practices. Hyperparameters are the algorithm options one "turns and tunes" when building a learning model. Hyperparameters cannot be learned using that algorithm. So, these parameters need to be assigned before training of the model. A lot of manual
This is the fourth post in my series of 10 machine learning best practices. It’s common to build models on historical training data and then apply the model to new data to make decisions. This process is called model deployment or scoring. I often hear data scientists say, “It took
This is the third post in my series of machine learning techniques and best practices. If you missed the earlier posts, read the first one now, or review the whole machine learning best practices series. Data scientists commonly use machine learning algorithms, such as gradient boosting and decision forests, that automatically build
This is the second post in my series of machine learning best practices. If you missed it, read the first post, Machine learning best practices: the basics. As we go along, all ten tips will be archived at this machine learning best practices page. Machine learning commonly requires the use of
I started my training in machine learning at the University of Tennessee in the late 1980s. Of course, we didn’t call it machine learning then, and we didn’t call ourselves data scientists yet either. We used terms like statistics, analytics, data mining and data modeling. Regardless of what you call
When building models, data scientists and statisticians often talk about penalty, regularization and shrinkage. What do these terms mean and why are they important? According to Wikipedia, regularization "refers to a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information usually