In my recent article on simulating Buffon's needle experiment, I computed the "running mean" of a series of values by using a single call to the CUSUM function in the SAS/IML language. For example, the following SAS/IML statements define a RunningMean function, generate 1,000 random normal values, and compute the
Search Results: simulation (461)
It is "well known" that the pairwise deletion of missing values and the resulting computation of correlations can lead to problems in statistical computing. I have previously written about this phenomenon in my article "When is a correlation matrix not a correlation matrix." Specifically, consider the symmetric array whose elements
Hello, 2012! It's a New Year and I'm flushed with ideas for new blog articles. (You can also read about The DO Loop's most popular posts of 2011.) The fundamental purpose of my blog is to present tips and techniques for writing efficient statistical programs in SAS. I pledge to
At the beginning of 2011, I made four New Year's resolutions for my blog. As the year draws to a close, it's time to see how I did: Resolution: 100 blog posts in 2011: Completed. I blew by this goal by posting 165 articles. I recently compiled a list of
Since this is a blog about statistical programming and analysis, I am always looking for data to analyze. As 2011 ends, I look back on the 165 blog entries that I published since 01JAN2011. This article presents the 10 most popular posts, as determined by the number of people who
In simulation studies, the response variable is often a binary (or Bernoulli) variable. Often 1 is used to indicate "success" (or the occurrence of an event) whereas 0 indicates "failure" (or the absence of an event). For example, the following SAS/IML statements define a vector x of zeros and ones:
I was recently asked the following question: I am using bootstrap simulations to compute critical values for a statistical test. Suppose I have test statistic for which I want a p-value. How do I compute this? The answer to this question doesn't require knowing anything about bootstrap methods. An equivalent
“I really wish someone had shown me this function in SAS sooner, because it’s saved me a ton of time and effort,” said Brandi Rhoads as she opened her presentation at the Western Users of SAS Software (WUSS) conference in San Francisco.
Normal, Poisson, exponential—these and other "named" distributions are used daily by statisticians for modeling and analysis. There are four operations that are used often when you work with statistical distributions. In SAS software, the operations are available by using the following four functions, which are essential for every statistical programmer
Sometimes a population of individuals is modeled as a combination of subpopulations. For example, if you want to model the heights of individuals, you might first model the heights of males and females separately. The height of the population can then be modeled as a combination of the male and
NOTE: SAS stopped shipping the SAS/IML Studio interface in 2018. It is no longer supported, so this article is no longer relevant. When I write SAS/IML programs, I usually do my development in the SAS/IML Studio environment. Why? There are many reasons, but the one that I will discuss today
I've previously discussed how to find the root of a univariate function. This article describes how to find the root (zero) of a function of several variables by using Newton's method. There have been many papers, books, and dissertations written on the topic of root-finding, so why am I blogging
I recently blogged about how many times, on average, you must roll a die until you see all six faces. This question is a special case of the coupon collector's problem. My son noted that the expected value (the mean number of rolls) is not necessarily the best statistic to
"Dad? How many times do I have to roll a die until all six sides appear?" I stopped what I was doing to consider my son's question. Although I could figure out the answer mathematically, sometimes experiments are more powerful than math equations for showing how probability works. "Why don't
Tuesday's release of SAS 9.3 included the new SAS Forecast Server 4.1, which has several valuable enhancements: Combination (Ensemble) Models: A combination of forecasts using different forecasting techniques can outperform forecasts produced by using any single technique. Users can combine forecasts produced by many different models using several different combination
Welcome, SAS 9.3! I've already blogged about some interface and graphical changes that everyone should know about. Now I'll put on my statistical hat and mention a few 9.3 features that excite me, personally, as a data analyst and a statistical programmer: As a statistician, I am keen to try
As I was reviewing notes for my course "Data Simulation for Evaluating Statistical Methods in SAS," I realized that I haven't blogged about simulating categorical data in SAS. This article corrects that oversight. An Easy Way and a Harder Way SAS software makes it easy to sample from discrete "named"
My primary purpose in writing The DO Loop blog is to share what I know about statistical programming in general and about SAS programming in particular. But I also write the blog for various personal reasons, including the enjoyment of writing. The other day I encountered a concept on Ajay
In a previous blog post, I showed how you can use simulation to construct confidence intervals for ranks. This idea (from a paper by E. Marshall and D. Spiegelhalter), enables you to display a graph that compares the performance of several institutions, where "institutions" can mean schools, companies, airlines, or
In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling
In a previous post, I described how to compute means and standard errors for data that I want to rank. The example data (which are available for download) are mean daily delays for 20 US airlines in 2007. The previous post carried out steps 1 and 2 of the method
I recently posted an article about representing uncertainty in rankings on the blog of the ASA Section for Statistical Programmers and Analysts (SSPA). The posting discusses the importance of including confidence intervals or other indicators of uncertainty when you display rankings. Today's article complements the SSPA post by showing how
I believe I would have interviewed AnnMaria De Mars even if you hadn't sent me scads of e-mails and tweets suggesting her as a perfect candidate for the SAS Rock Stars series. I "met" AnnMaria when I started looking for SAS users on Twitter – nearly three years ago while
In my spare time, I enjoy browsing the StackOverflow discussion forum to see what questions people are asking about SAS, SAS/IML, and statistics. Last week, a statistics student asked for help with the following homework problem: I need to generate a one-dimensional random walk in which the step length and
Will I be in Las Vegas for for SAS Global Forum 2011? "You can bet on it!" In fact, I will be there and be square, since I am one of the statisticians who will represent SAS Research and Development. This year, I'm focused on three main activities: Presenting a
In a previous blog post, I described the rules for a tic-tac-toe scratch-off lottery game and showed that it is a bad idea to generate the game tickets by using a scheme that uses equal probabilities. Instead, cells that yield large cash awards must be assigned a small probability of
Because of this week's story about a geostatistician, Mohan Srivastava, who figured out how predict winning tickets in a scratch-off lottery, I've been thinking about scratch-off games. He discovered how to predict winners when he began to "wonder how they make these [games]." Each ticket has a set of "lucky
"What is the chance that two people in a room of 20 share initials?" This was the question posed to me by a colleague who had been taking notes at a meeting with 20 people. He recorded each person's initials next to their comments and, upon editing the notes, was
SAS/IML software is often used for sampling and simulation studies. For simulating data from univariate distributions, the RANDSEED and RANDGEN subroutines suffice to sample from a wide range of distributions. (I use the terms "sampling from a distribution" and "simulating data from a distribution" interchangeably.) For multivariate simulations, the IMLMLIB
Computing probabilities can be tricky. And if you are a statistician and you get them wrong, you feel pretty foolish. That's why I like to run a quick simulation just to make sure that the numbers that I think are correct are, in fact, correct. My last post of 2010