El popular dicho español ‘El que mucho abarca poco aprieta’ ha cobrado gran relevancia en la era del análisis de datos. Cuando los data scientists, data analysts, y analistas en general realizan modelos para predecir comportamientos, tendencias, patrones, etc. se enfrentan ante el desafío de abarcar lo suficiente para apretar
Tag: overfitting
This is the seventh post in my series of machine best practices. Catch up by reading the first post or the whole series now. Generalization is the learned model’s ability to fit well to new, unseen data instead of the data it was trained on. Overfitting refers to a model that fits
When building models, data scientists and statisticians often talk about penalty, regularization and shrinkage. What do these terms mean and why are they important? According to Wikipedia, regularization "refers to a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting. This information usually
Ensemble methods are commonly used to boost predictive accuracy by combining the predictions of multiple machine learning models. The traditional wisdom has been to combine so-called “weak” learners. However, a more modern approach is to create an ensemble of a well-chosen collection of strong yet diverse models. Building powerful ensemble models
Why the Attraction for the Offensive Paradigm? In addition to the reasons provided by Green and Armstrong, I'd like to add one more reason for the lure of complexity: You can always add complexity to a model to better fit the history. In fact, you can always create a model