Contributed by Rick Wicklin, author of Statistical Programming with SAS/IML Software This year at SAS Global Forum 2011, I am presenting a statistical tutorial, "Data Simulation for Evaluating Statistical Methods in SAS". In this course, I show how to create data with known properties (such as skewed or heavy-tailed) and

# Author

I was inspired by Chris Hemedinger's blog posts about his daughter's science fair project. Explaining statistics to a pre-teenager can be a humbling experience. My 11-year-old son likes science. He recently set about trying to measure which of three projectile launchers is the most accurate. I think he wanted to

I don't know much about the SQL procedure, but I know that it is powerful. According to the SAS documentation for the SQL procedure, "PROC SQL can perform some of the operations that are provided by the DATA step and the PRINT, SORT, and SUMMARY procedures." Recently, a fellow blogger,

If you are a statistical programmer, sooner or later you have to compute a confidence interval. In the SAS/IML language, some beginning programmers struggle with forming a confidence interval. I don't mean that they struggle with the statistics (they know how to compute the relevant quantities), I mean that they

The Flowing Data blog posted some data about how much TV actors get paid per episode. About a dozen folks have created various visualizations of the data (see the comments in the Flowing Data blog), several of them very glitzy and fancy. One variable in the data is a categorical

Suppose that you want to create a matrix in SAS/IML software that has a special structure, such as a tridiagonal matrix. How do you do it? Or suppose that you want to find elements of a matrix A such that A[i,j] satisfies a certain condition. How do you get the

If you tell my wife that she's married to a statistical geek, she'll nod knowingly. She is used to hearing sweet words of affection such as You are more beautiful than Euler's identity. or My love for you is like the exponential function: increasing, unbounded, and transcendental. But those are

In a previous blog post, I described the rules for a tic-tac-toe scratch-off lottery game and showed that it is a bad idea to generate the game tickets by using a scheme that uses equal probabilities. Instead, cells that yield large cash awards must be assigned a small probability of

Because of this week's story about a geostatistician, Mohan Srivastava, who figured out how predict winning tickets in a scratch-off lottery, I've been thinking about scratch-off games. He discovered how to predict winners when he began to "wonder how they make these [games]." Each ticket has a set of "lucky

I enjoyed the Dataists' data-driven blog on the best numbers to choose in a Super Bowl betting pool. It reminded me of my recent investigation of which initials are most common. Because the Dataists' blog featured an R function that converts Arabic numerals into Roman numerals, the blog post also

The other day, someone asked me how to compute a matrix of pairwise differences for a vector of values. The person asking the question was using SQL to do the computation for 2,000 data points, and it was taking many hours to compute the pairwise differences. He asked if SAS/IML

On Friday, I posted an article about using spatial statistics to detect whether a pattern of points is truly random. That day, one of my colleagues asked me whether there are any practical applications of detecting spatial randomness or non-randomness. "Oh, sure," I replied, and rattled off a list of

When you pass a matrix as an parameter (argument) to a SAS/IML module, the SAS/IML language does not create a copy of the matrix. That approach, known as "calling by value," is inefficient. It is well-known that languages that implement call-by-value semantics suffer performance penalties. In the SAS/IML language, matrices

Last week I generated two kinds of random point patterns: one from the uniform distribution on a two-dimensional rectangle, the other by jittering a regular grid by a small amount. My show choir director liked the second method (jittering) better because of the way it looks on stage: there are

One of my New Year's resolutions is to learn a new area of statistics. I'm off to a good start, because I recently investigated an issue which started me thinking about spatial statistics—a branch of statistics that I have never formally studied. During the investigation, I asked myself: Given an

As Cat Truxillo points out in her recent blog post, some SAS procedures require data to be in a "long" (as opposed to "wide") format. Cat uses a DATA step to convert the data from wide to long format. Although there is nothing wrong with this approach, I prefer to

I sing in the SAS-sponsored VocalMotion show choir. It's like an adult version of Glee, except we have more pregnancies and fewer slushie attacks. For many musical numbers, the choreographer arranges the 20 performers on stage in an orderly manner, such as four rows of five singers. But every once

A histogram displays the number of points that fall into a specified set of bins. This blog post shows how to efficiently compute a SAS/IML vector that contains those counts. I stress the word "efficiently" because, as is often the case, a SAS/IML programmer has a variety of ways to

Have you ever wanted to compute the exact value of a really big number such as 200! = 200*199*...*2*1? You can do it—if you're willing to put forth some programming effort. This blog post shows you how. Jiangtang Hu's recent blog discusses his quest to compute large factorials in many programming languages.

The other day I needed to check that a sequence of numerical values was in strictly increasing order. My first thought was to sort the values and compare the sorted and original values, but I quickly discarded that approach because it does not detect duplicate values in a montonic (nondecreasing)

In a previous post, I described ways to create SAS/IML vectors that contain uniformly spaced values. The methods did not involve writing any loops. This post describes how to perform a similar operation: creating evenly spaced values on a two-dimensional grid. The DATA step solution is simple, but an efficient

"What is the chance that two people in a room of 20 share initials?" This was the question posed to me by a colleague who had been taking notes at a meeting with 20 people. He recorded each person's initials next to their comments and, upon editing the notes, was

A colleague posted some data on his internal SAS blog about key trends in the US Mobile phone industry, as reported by comScore. He graciously shared the data so that I could create a graph that visualizes the trends. The plot visualizes trends in the data: the Android phone is

When your data are in rows, but you need them in columns, use the matrix transpose function or operator. The same advice applies to data in columns that you want to be in rows. For example, the vectors created by the DO function and the index creation operator are row

A colleague related the following story: He was taking notes at a meeting that was attended by a fairly large group of people (about 20). As each person made a comment or presented information, he recorded the two-letter initials of the person who spoke. After the meeting was over, he

SAS/IML software is often used for sampling and simulation studies. For simulating data from univariate distributions, the RANDSEED and RANDGEN subroutines suffice to sample from a wide range of distributions. (I use the terms "sampling from a distribution" and "simulating data from a distribution" interchangeably.) For multivariate simulations, the IMLMLIB

It is often useful to create a vector with elements that follow an arithmetic sequence. For example, {1, 2, 3, 4} and {10, 30, 50, 70} are vectors with evenly spaced values. This post describes several ways to create vectors such as these. The SAS/IML language has two ways to

Computing probabilities can be tricky. And if you are a statistician and you get them wrong, you feel pretty foolish. That's why I like to run a quick simulation just to make sure that the numbers that I think are correct are, in fact, correct. My last post of 2010

The Junk Chart blog discusses problems with a chart which (poorly) presents statistics on the prevalence of shark attacks by different species. Here is the same data presented by overlaying two bar charts by using the SGPLOT procedure. I think this approach works well because the number of deaths is

Over at the SAS/IML Discussion Forum, someone posted an interesting question about how to create a special matrix that contains all combinations of zeros and ones for a given size. Specifically, the problem is as follows. Given an integer n ≥ 1, produce a matrix with 2n rows and n