The whys and hows of simulating data

1

Simulating Data with SASWhen I told a friend that the title of my new book is Simulating Data with SAS, she asked, “Why would anyone want to simulate data?”  To her, data are measured or surveyed. Data tell us how big, how often, and how many.  Data indicate people’s opinions about politics and breakfast cereals. Data are real.

So why simulate data? It’s an excellent question, which I address in Chapters 4 and 5 of my book. The short answer is that simulation offers a way to understand how sampling variability affects a statistic. In particular, simulation can tell you how the value of a statistic might change if you were to collect another sample of the same size. Simulation also enables you to understand how statistical techniques perform on data that are skewed or that are contaminated by outliers.

Professional statisticians understand the value of simulating data, so most of my book is concerned with how to simulate data efficiently.  Internet discussion forums are filled with examples of statistical programmers who lament “My simulation is taking too long to run! Can someone please help?” My book describes how to plan a simulation study and includes strategies for efficient and effective simulation.

Advanced practitioners will enjoy the chapters that show how to simulate univariate and multivariate data with specified properties. Do you need to simulate correlated data? I provide techniques for simulating from several important multivariate distributions. Are you designing an experiment and need to determine the sample size that will enable you to detect a specified alternative hypothesis with a required power? There are several examples of this type.  

A reviewer stated, "This book is a powerful learning tool and an indispensable aid to anyone who wants to use simulation methods in SAS.” I hope so. The book includes hundreds of SAS programs and about 80 graphs that visualize the results of computations. There are about 130 exercises.  Every program in the book is available for free from the book’s Web page.

Teachers can use the examples in Simulating Data with SAS to explain sampling variability. Students and researchers can use the strategies to plan simulation studies as part of their research projects. Practicing statisticians can use the techniques to understand the variability of complex statistics for which theoretical results are unavailable. And SAS programmers whose simulations are “taking too long to run” can now consult a reference book that shows how to simulate data efficiently.

Share

About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Back to Top