If I were a carpenter … thoughts on the recent SAS Statistical Software Release


DrillsOne question I get asked a lot is: What is the most exciting new statistical feature in the 14.1 release? And they get a bit frustrated when I say: It depends.

But it does depend! SAS statistical software provides a broad array of capabilities that help users track disease outbreaks, predict cell phone plan choices, design clinical trials, improve health care utilization, plan agricultural experiments, create more effective web sites, and determine insurance premiums. And those are just a few examples. Which new feature makes your day depends on your particular area of statistical practice.

So, let me answer the question this way:

If I were a sample survey researcher, I would be excited about new techniques for dealing with missing data from surveys!

Nonresponse in surveys is a big problem. Even with government surveys conducted in-person by researchers, there are question that respondents don’t want to answer, such as topics like income or diet. Missing data matters because the results of the analysis may be biased if non-respondents are different from respondents, and the results may also be less precise.

One way to deal with missing data is to impute them — in other words, replace them with observed values from the same question. And in SAS/STAT 14.1 this is what the new SURVEYIMPUTE procedure does. Once you produce a data set with the imputed values, you can analyze it with the usual survey data analysis procedures, incorporating weights that account for the imputation.

Imputation methods in PROC SURVEYIMPUTE include single and multiple hot-deck imputation and fully efficient fractional imputation (FEFI). Donor selection techniques include simple random selection with or without replacement, probability proportional to weights selection, and approximate Bayesian bootstrap selection.

If I were a data scientist, I would be thrilled with new methods for fitting generalized additive models!

These models are useful because you can include covariate effects that enter the model in a complex, nonlinear fashion, without specifying a parametric form as in standard regression. The new GAMPL procedure, which relies on penalized likelihood estimation, is appropriate for problems that can have hundreds or even thousands of covariates. The GAMPL procedure is also a high-performance procedure that can operate in a distributed computing environment when you have a SAS High Performance Statistics license.

As a data scientist, I would also be thrilled with new model selection methods in the GLMSELECT and HPGENSELECT procedures. In recent releases, SAS/STAT has added techniques for dealing with hundreds or thousands of predictors. For instance, the group LASSO method is now available in the GLMSELECT procedure, and the LASSO method is now available in the HPGENSELECT procedure.

If I were an ecologist, I would be excited by the new and improved HPSPLIT procedure!

I would be excited to see the graphical displays that HPSPLIT now produces for classification and regression trees, now specified with a familiar modeling syntax. These techniques, which have been a mainstay of the data mining and machine learning communities, are well suited for complex ecological data.

If I were a healthcare analyst, I would be pleased with control charts for rare events now available in SAS/QC!

These specialized control charts, produced by the new RAREEVENTS procedure, are helpful for monitoring infrequent adverse events. For example, when used in hospitals, rare events charts can signal increases in surgical site infections, medication errors, and accidental needle sticks that put patients at risk.

If I were an educational researcher, I would be happy with a new technique that expands the models I can fit to educational assessment data!

For example, when fitting models that incorporate hierarchies of class, school, and county with random effects, the computations can take an enormous amount of time, and sometimes the problem is too unwieldly for a solution.

The new FASTQUAD option in the GLIMMIX procedure provides a technique that addresses this issue. Problems that once took hours (and sometimes days) to solve now take minutes, and researchers can now fit the models they want instead of the models that can be computed.

If I were a statistician doing oncology research, I would be pleased that I can now compute power for proportional hazards regression!

This regression method for time-to-event data is heavily used in analyzing cancer treatment data. And now you can use SAS in the design phase of the clinical trial, since power computations are used to determine the number of subjects that have to be enrolled to produce the desired power for the analysis.

In addition, I would also be pleased to find that I can now perform nonparametric analysis for competing risk data. Techniques for dealing with competing risks (those events that may result in outcomes that are being measured, such as death, but are not produced by the disease under study) are an important advance in survival analysis.

If I were a Bayesian, I would be delighted with the new additions to the MCMC procedure for fitting Bayesian models!

Bayesian methods are no longer a competing school of thought in the statistical world but a powerful framework for data analysis that provides benefit to all practicing statisticians. We are receiving tremendous positive feedback from organizations around the world about the comprehensiveness of the MCMC procedure, as well as the built-in Bayesian capabilities in procedures such as GENMOD and PHREG that makes standard Bayesian analyses simple to request.

In the 14.1 release, the MCMC procedure includes additional sampling algorithms for continuous variables that can lead to significant improvements in sampling efficiency. In addition, this release includes support for leading and lagging variables, an ordinary differential equation solver, and a general integration function. These updates enable you to fit various flavors of state space models and pharmacokinetic models.

If I were an econometrician, I would be thrilled that SAS/ETS 14.1 has numerous new capabilities for me!

And finally, if I were the author of a book on categorical analysis using SAS, I would be ecstatic about the inclusion of adjacent category models in the LOGISTIC procedure, and the availability of alternative confidence limits for the odds ratio in PROC FREQ! (But then I would wonder if I needed to update my book.)

Check out all the new analytical capabilities in this release and you may find another feature that is the release gem for you.


About Author

Maura Stokes

Sr Director, Advanced Analytics R&D

1 Comment

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top