The DO Loop
Statistical programming in SAS with an emphasis on SAS/IML programsdata:image/s3,"s3://crabby-images/958c0/958c002ceda14e3ee373b40e70dba369f7260cad" alt="Influential observations in a linear regression model: The DFFITS and Cook's D statistics"
A previous article describes the DFBETAS statistics for detecting influential observations, where "influential" means that if you delete the observation and refit the model, the estimates for the regression coefficients change substantially. Of course, there are other statistics that you could use to measure influence. Two popular ones are the
data:image/s3,"s3://crabby-images/90874/90874e56d75761a5373c68e4b83da093a38e9795" alt="Influential observations in a linear regression model: The DFBETAS statistics"
My article about deletion diagnostics investigated how influential an observation is to a least squares regression model. In other words, if you delete the i_th observation and refit the model, what happens to the statistics for the model? SAS regression procedures provide many tables and graphs that enable you to
data:image/s3,"s3://crabby-images/5d401/5d4016069fe3870f52ff02bb597807cdd53cea89" alt="Leave-one-out statistics and a formula to update a matrix inverse"
For linear regression models, there is a class of statistics that I call deletion diagnostics or leave-one-out statistics. These observation-wise statistics address the question, "If I delete the i_th observation and refit the model, what happens to the statistics for the model?" For example: The PRESS statistic is similar to
data:image/s3,"s3://crabby-images/07a45/07a4516de7f6e98154e0651f189587f662d3ab04" alt="5 reasons to use PROC FORMAT to recode variables in SAS"
Recoding variables can be tedious, but it is often a necessary part of data analysis. Almost every SAS programmer has written a DATA step that uses IF-THEN/ELSE logic or the SELECT-WHEN statements to recode variables. Although creating a new variable is effective, it is also inefficient because you have to
data:image/s3,"s3://crabby-images/052de/052de60a8ed353a272675fb0c031e881d4c71ae1" alt="Plot a family of curves in SAS"
A family of curves is generated by an equation that has one or more parameters. To visualize the family, you might want to display a graph that overlays four of five curves that have different parameter values, as shown to the right. The graph shows members of a family of
data:image/s3,"s3://crabby-images/5832d/5832d4f260b1cc1f4c235552865df703704ad0b5" alt="Graph wide data and long data in SAS"
Statistical programmers and analysts often use two kinds of rectangular data sets, popularly known as wide data and long data. Some analytical procedures require that the data be in wide form; others require long form. (The "long format" is sometimes called "narrow" or "tall" data.) Fortunately, the statistical graphics procedures