One of my New Year's resolutions is to learn a new area of statistics. I'm off to a good start, because I recently investigated an issue which started me thinking about spatial statistics—a branch of statistics that I have never formally studied. During the investigation, I asked myself: Given an
Author
As Cat Truxillo points out in her recent blog post, some SAS procedures require data to be in a "long" (as opposed to "wide") format. Cat uses a DATA step to convert the data from wide to long format. Although there is nothing wrong with this approach, I prefer to
I sing in the SAS-sponsored VocalMotion show choir. It's like an adult version of Glee, except we have more pregnancies and fewer slushie attacks. For many musical numbers, the choreographer arranges the 20 performers on stage in an orderly manner, such as four rows of five singers. But every once
A histogram displays the number of points that fall into a specified set of bins. This blog post shows how to efficiently compute a SAS/IML vector that contains those counts. I stress the word "efficiently" because, as is often the case, a SAS/IML programmer has a variety of ways to
Have you ever wanted to compute the exact value of a really big number such as 200! = 200*199*...*2*1? You can do it—if you're willing to put forth some programming effort. This blog post shows you how. Jiangtang Hu's recent blog discusses his quest to compute large factorials in many programming languages.
The other day I needed to check that a sequence of numerical values was in strictly increasing order. My first thought was to sort the values and compare the sorted and original values, but I quickly discarded that approach because it does not detect duplicate values in a montonic (nondecreasing)
In a previous post, I described ways to create SAS/IML vectors that contain uniformly spaced values. The methods did not involve writing any loops. This post describes how to perform a similar operation: creating evenly spaced values on a two-dimensional grid. The DATA step solution is simple, but an efficient
"What is the chance that two people in a room of 20 share initials?" This was the question posed to me by a colleague who had been taking notes at a meeting with 20 people. He recorded each person's initials next to their comments and, upon editing the notes, was
A colleague posted some data on his internal SAS blog about key trends in the US Mobile phone industry, as reported by comScore. He graciously shared the data so that I could create a graph that visualizes the trends. The plot visualizes trends in the data: the Android phone is
When your data are in rows, but you need them in columns, use the matrix transpose function or operator. The same advice applies to data in columns that you want to be in rows. For example, the vectors created by the DO function and the index creation operator are row
A colleague related the following story: He was taking notes at a meeting that was attended by a fairly large group of people (about 20). As each person made a comment or presented information, he recorded the two-letter initials of the person who spoke. After the meeting was over, he
SAS/IML software is often used for sampling and simulation studies. For simulating data from univariate distributions, the RANDSEED and RANDGEN subroutines suffice to sample from a wide range of distributions. (I use the terms "sampling from a distribution" and "simulating data from a distribution" interchangeably.) For multivariate simulations, the IMLMLIB
It is often useful to create a vector with elements that follow an arithmetic sequence. For example, {1, 2, 3, 4} and {10, 30, 50, 70} are vectors with evenly spaced values. This post describes several ways to create vectors such as these. The SAS/IML language has two ways to
Computing probabilities can be tricky. And if you are a statistician and you get them wrong, you feel pretty foolish. That's why I like to run a quick simulation just to make sure that the numbers that I think are correct are, in fact, correct. My last post of 2010
The Junk Chart blog discusses problems with a chart which (poorly) presents statistics on the prevalence of shark attacks by different species. Here is the same data presented by overlaying two bar charts by using the SGPLOT procedure. I think this approach works well because the number of deaths is
Over at the SAS/IML Discussion Forum, someone posted an interesting question about how to create a special matrix that contains all combinations of zeros and ones for a given size. Specifically, the problem is as follows. Given an integer n ≥ 1, produce a matrix with 2n rows and n
It's a New Year and I'm ready to make some resolutions. Last year I launched this blog with my Hello, World post in which I said: In this blog I intend to discuss, describe, and disseminate ideas related to statistical programming with the SAS/IML language.... I will present tips and
In many families, siblings draw names so that each family member and spouse gives and receives exactly one present. This year there was a little bit of controversy when a family member noticed that once again she was assigned to give presents to me. This post includes my response to
When I wake up early to write my blog, I often wonder, "Is anyone going to read this?" Apparently so. I started writing The DO Loop in September, 2010. Since then, I've posted about 60 entries about statistical programming with SAS/IML software. Since this is a statistical blog, it is
When I finished writing my book, Statistical Programming with SAS/IML Software, I was elated. However, one small task still remained. I had to write the index. How Long Should an Index Be? My editor told me that SAS Press would send the manuscript to a professional editor who would index
Recently, I needed to detect whether a matrix consists entirely of missing values. I wrote the following module: proc iml; /** Module to detect whether all elements of a matrix are missing values. Works for both numeric and character matrices. Version 1 (not optimal) **/ start isMissing(x); if type(x)='C' then
NOTE: SAS stopped shipping the SAS/IML Studio interface in 2018. The references in this article to IMLPlus and SAS/IML Studio are no longer relevant. There are three kinds of programming errors: parse-time errors, run-time errors, and logical errors. It doesn't matter what language you are using (SAS/IML, MATLAB, R, C/C++,
Both covariance matrices and correlation matrices are used frequently in multivariate statistics. You can easily compute covariance and correlation matrices from data by using SAS software. However, sometimes you are given a covariance matrix, but your numerical technique requires a correlation matrix. Other times you are given a correlation matrix,
Sample covariance matrices and correlation matrices are used frequently in multivariate statistics. This post shows how to compute these matrices in SAS and use them in a SAS/IML program. There are two ways to compute these matrices: Compute the covariance and correlation with PROC CORR and read the results into
I enjoy reading about the Le Monde puzzles (and other topics!) at Christian Robert's blog. Recently he asked how to convert a number with s digits into a numerical vector where each element of the vector contains the corresponding digit (by place value). For example, if the number is 4321,
The SAS/IML language enables you to perform matrix-vector computations. However, it also provides a convenient "shorthand notation" that enables you to perform elementwise operation on rows or columns in a natural way. You might know that the SAS/IML language supports subscript reduction operators to compute basic rowwise or columnwise quantities.
My last post was a criticism of a statistical graph that appeared in Bloomberg Businessweek. Criticism is easy. Analysis is harder. In this post I re-analyze the data to present two graphics that I think should have replaced the one graphic in Businessweek. You can download the SAS program that
Recently I read a blog that advertised a data visualization competition. Under the heading "What Are We Looking For?" is a link to a 2007 Bloomberg Businessweek graph that visualizes how participation in online social media activities vary across age groups. The graph is reproduced below at a smaller scale:
Errors. We all make them. After all, “to err is human.” Or, as programmers often say, “To err is human, but to really foul things up requires a computer” (Farmer’s Almanac, 1978). This post describes how to interpret error messages from PROC IML that appear in the SAS log. The
I am thankful to be a statistical programmer. When I wake up in the morning, I am eager to start my day. I love statistics, programming, and working at SAS, and I write my blog to share that joy. This a Golden Age for statistical programmers because theoretical ideas and