The DO Loop
Statistical programming in SAS with an emphasis on SAS/IML programsdata:image/s3,"s3://crabby-images/4a7f8/4a7f88e7825bfa62eb25c6858c8b8cb1ed4e18a5" alt="How to simulate multivariate outliers"
In simulation studies, sometimes you need to simulate outliers. For example, in a simulation study of regression techniques, you might want to generate outliers in the explanatory variables to see how the technique handles high-leverage points. This article shows how to generate outliers in multivariate normal data that are a
data:image/s3,"s3://crabby-images/a4331/a433140933c0caa0b95a8cde3f148fbe41be7ff8" alt="The geometry of multivariate versus univariate outliers Schematic diagram of outliers in bivariate normal data. The point 'A' has large univariate z scores but a small Mahalanobis distance. The point 'B' has a large Mahalanobis distance. Only 'b' is a multivariate outlier."
An important concept in multivariate statistical analysis is the Mahalanobis distance. The Mahalanobis distance provides a way to measure how far away an observation is from the center of a sample while accounting for correlations in the data. The Mahalanobis distance is a good way to detect outliers in multivariate
data:image/s3,"s3://crabby-images/0e238/0e238032969269fe61a71510b9cfc88b42efab2d" alt="Truncate response surfaces"
An analyst was using SAS to analyze some data from an experiment. He noticed that the response variable is always positive (such as volume, size, or weight), but his statistical model predicts some negative responses. He posted the data and asked if it is possible to modify the graph so
data:image/s3,"s3://crabby-images/f6841/f6841e2d24f46fb7ce7f8834efdc9710b25b13f6" alt="Interpolation vs extrapolation: the convex hull of multivariate data"
Statisticians often emphasize the dangers of extrapolating from a univariate regression model. A common exercise in introductory statistics is to ask students to compute a model of population growth and predict the population far in the future. The students learn that extrapolating from a model can result in a nonsensical
data:image/s3,"s3://crabby-images/cbfdd/cbfdd9da0a9d28dbe901c2c4bdd05bfcf04df957" alt="The value of pi depends on how you measure distance"
It's time to celebrate Pi Day! Every year on March 14th (written 3/14 in the US), math-loving folks celebrate "all things pi-related" because 3.14 is the three-decimal approximation to the mathematical constant, π. Although children learn that pi is approximately 3.14159..., the actual definition of π is the ratio of
data:image/s3,"s3://crabby-images/7e153/7e15399ce30c891cc98fa71e0d4956b68c1791a4" alt="How to detect SAS data sets that contain (or do not contain) character variables"
A SAS programmer posted an interesting question on a SAS discussion forum. The programmer wanted to iterate over hundreds of SAS data sets, read in all the character variables, and then do some analysis. However, not every data set contains character variables, and SAS complains when you ask it to