The DO Loop
Statistical programming in SAS with an emphasis on SAS/IML programsdata:image/s3,"s3://crabby-images/e8e83/e8e83027596ea2053b061e7cb7e0a1607baa07d3" alt="Create your own version of Anscombe's quartet: Dissimilar data that have similar statistics"
I think every course in exploratory data analysis should begin by studying Anscombe's quartet. Anscombe's quartet is a set of four data sets (N=11) that have nearly identical descriptive statistics but different graphical properties. They are a great reminder of why you should graph your data. You can read about
data:image/s3,"s3://crabby-images/a00af/a00af8bcc7109a98e2c5d869d123c92bb0725805" alt="Efficient evaluation of a quadratic form"
A quadratic form is a second-degree polynomial that does not have any linear or constant terms. For multivariate polynomials, you can quickly evaluate a quadratic form by using the matrix expression x` A x This computation is straightforward in a matrix language such as SAS/IML. However, some computations in statistics
data:image/s3,"s3://crabby-images/7e153/7e15399ce30c891cc98fa71e0d4956b68c1791a4" alt="4 ways to compute an SSCP matrix"
In numerical linear algebra, there are often multiple ways to solve a problem, and each way is useful in various contexts. In fact, one of the challenges in matrix computations is choosing from among different algorithms, which often vary in their use of memory, data access, and speed. This article
data:image/s3,"s3://crabby-images/7e153/7e15399ce30c891cc98fa71e0d4956b68c1791a4" alt="Use the FLOOR-MOD trick to allocate items to groups"
Suppose you need to assign 100 patients equally among 3 treatment groups in a clinical study. Obviously, an equal allocation is impossible because the second number does not evenly divide the first, but you can get close by assigning 34 patients to one group and 33 to the others. Mathematically,
data:image/s3,"s3://crabby-images/03a8c/03a8c7c176f28243d25f1a6a39da6edafdef355b" alt="Convergence in mixed models: When the estimated G matrix is not positive definite"
I've previously written about how to deal with nonconvergence when fitting generalized linear regression models. Most generalized linear and mixed models use an iterative optimization process, such as maximum likelihood estimation, to fit parameters. The optimization might not converge, either because the initial guess is poor or because the model
data:image/s3,"s3://crabby-images/05ccd/05ccd1ab6cc3a1062a94b2d764820ea286858a3b" alt="Matrix operations and BY groups"
Many SAS procedures support the BY statement, which enables you to perform an analysis for subgroups of the data set. Although the SAS/IML language does not have a built-in "BY statement," there are various techniques that enable you to perform a BY-group analysis. The two I use most often are