SAS author's tip: the basic structure of efficient simulations in SAS

This week's SAS tip is from Rick Wicklin and his powerful new book Simulating Data with SAS. Rick is a principal researcher in computational statistics at SAS, where he develops and supports the IML procedure and the SAS/IML Studio application. Chances are you're already familiar with Rick's work - whether you've seen him present at conferences, used his books, or read his popular blog The DO Loop.

I hope you'll enjoy this week's free book excerpt. Additionally, Rick kindly provided a link to his blog post Simulation in SAS: The slow way or the BY way - for an example of what NOT to do in Base SAS.

The following excerpt is from SAS Press author Rick Wicklin's book “Simulating Data with SAS” Copyright © 2013, SAS Institute Inc., Cary, North Carolina, USA. ALL RIGHTS RESERVED. (please note that results may vary depending on your version of SAS software).

6.4.1 The Basic Structure of Efficient Simulations in SAS

Recall that there are two basic techniques for simulating and analyzing data: the BY-group technique and the in-memory technique.

When you use the BY-group technique, do the following:

Identify each sample by a unique value of the SampleID variable. Use a BY statement to compute statistics for each BY group.
Suppress all output during the BY group analysis. Many procedures have a NOPRINT option in the PROC statement. Otherwise, use the method described in Section 6.4.2 to suppress ODS output.

When you use the SAS/IML in-memory technique, do the following:

Use the J function to allocate a vector (or matrix) to store the simulated data before you call the RANDGEN subroutine. This enables you to generate an entire sample (or even multiple samples) with a single call to the RANDGEN subroutine. Do not generate one random value at a time.
When possible, compute statistics for all samples with a single call. For example, the MEAN function can compute the means of all columns in a matrix. The subscript reduction operator x[,:] computes the means of all rows in the x matrix.
If you are analyzing multivariate data, then it is often convenient to generate the data in a DO loop. At each iteration, generate a single sample and compute the statistic on that sample. This approach is described further in Chapter 8, “Simulating Data from Basic Multivariate Distributions.”

You can read more about Rick Wicklin's book Simulating Data with SAS here.

Blogs

Blogs

SAS author's tip: the basic structure of efficient simulations in SAS

About Author