The DO Loop
Statistical programming in SAS with an emphasis on SAS/IML programs![Model selection with PROC GLMSELECT](https://blogs.sas.com/content/iml/files/2019/02/ModelSelectAnim.gif)
I previously discussed how you can use validation data to choose between a set of competing regression models. In that article, I manually evaluated seven models for a continuous response on the training data and manually chose the model that gave the best predictions for the validation data. Fortunately, SAS
![Model assessment and selection in machine learning](https://blogs.sas.com/content/iml/files/2019/01/validation2-640x336.png)
Machine learning differs from classical statistics in the way it assesses and compares competing models. In classical statistics, you use all the data to fit each model. You choose between models by using a statistic (such as AIC, AICC, SBC, ...) that measures both the goodness of fit and the
![Simulate data for a regression model with categorical and continuous variables Parameter estimates for synthetic (simulated) data that follows a regression model.](https://blogs.sas.com/content/iml/files/2019/01/simGLM1-379x336.png)
This article shows how to use SAS to simulate data that fits a linear regression model that has categorical regressors (also called explanatory or CLASS variables). Simulating data is a useful skill for both researchers and statistical programmers. You can use simulation for answering research questions, but you can also
![Coding and simulating categorical variables in regression models](https://blogs.sas.com/content/iml/files/2017/01/ProgrammingTips-2.png)
Recently I was asked to explain the result of an ANOVA analysis that I posted to a statistical discussion forum. My program included some simulated data for an ANOVA model and a call to the GLM procedure to estimate the parameters. I was asked why the parameter estimates from PROC
![Create training, validation, and test data sets in SAS Partition data into training, validation, and testing in SAS](https://blogs.sas.com/content/iml/files/2019/01/partition1.png)
In machine learning and other model building techniques, it is common to partition a large data set into three segments: training, validation, and testing. Training data is used to fit each model. Validation data is a random sample that is used for model selection. These data are used to select
![Three ways to add a line to a Q-Q plot](https://blogs.sas.com/content/iml/files/2019/01/qqline0-640x336.png)
A quantile-quantile plot (Q-Q plot) is a graphical tool that compares a data distribution and a specified probability distribution. If the points in a Q-Q plot appear to fall on a straight line, that is evidence that the data can be approximately modeled by the target distribution. Although it is