A SAS/IML user on a discussion forum was trying to read data into a SAS/IML matrix, but the data was so large that it would not fit into memory. (Recall that SAS/IML matrices are kept in RAM.) After a few questions, it turned out that the user was trying to
Author
A while ago I saw a blog post on how to simulate Bernoulli outcomes when the probability of generating a 1 (success) varies from observation to observation. I've done this often in SAS, both in the DATA step and in the SAS/IML language. For example, when simulating data that satisfied
When a categorical variable has dozens or hundreds of categories, it is often impractical and undesirable to create a bar chart that shows the counts for all categories. Two alternatives are popular: Display only the Top 10 or Top 20 categories. As I showed last week, to do this in
Sometimes a categorical variable has many levels, but you are only interested in displaying the levels that occur most frequently. For example, if you are interested in the number of times that a song was purchased on iTunes during the past week, you probably don't want a bar chart with
I am pleased to announce that this year at SAS Global Forum 2013 (San Francisco, April 27 to May 1, 2013) I am giving a free hands-on workshop (HOW) entitled "Getting Started with the SAS/IML Language." If you are not familiar with the very popular Hands-On Workshop series at SAS
It's the start of a new year. Have you made a resolution to be a better data analyst? A better SAS statistical programmer? To learn more about multivariate statistics? What better way to start the New Year than to read (or re-read!) the top 12 articles for statistical programmers from
In my previous post, I described how to implement an iterated function system (IFS) in the SAS/IML language to draw fractals. I used the famous Barnsley fern example to illustrate the technique. At the end of the article I issued a challenge: can you construct an IFS whose fractal attractor
Fractals. If you grew up in the 1980s or '90s and were interested in math and computers, chances are you played with computer generation of fractals. Who knows how many hours of computer time was spent computing Mandelbrot sets and Julia sets to ever-increasing resolutions? When I was a kid,
In a recent article on efficient simulation from a truncated distribution, I wrote some SAS/IML code that used the LOC function to find and exclude observations that satisfy some criterion. Some readers came up with an alternative algorithm that uses the REMOVE function instead of subscripts. I remarked in a
It seemed like an easy task. A SAS user asked me how to use the SGPLOT procedure to create a bar chart where the vertical axis shows percentages instead of counts. I assumed that there was some simple option that would change the scale of the vertical axis from counts
Frequently someone will post a question to the SAS Support Community that says something like this: I am trying to do [statistical task]and SAS issues an error and reports that my correlation matrix is not positive definite. What is going on and how can I complete [the task]? The statistical
Last week I wrote about using acceptance-rejection algorithms in vector languages to simulate data. The main point I made is that in a vector language it is efficient to generate many more variates than are needed, with the knowledge that a certain proportion will be rejected. In last week's article,
The LOC function is one of the most important functions in the SAS/IML language. The LOC function finds elements of a vector or matrix that satisfy some condition. For example, if you are going to apply a logarithmic transform to data, you can use the LOC function to find all
A few days ago on the SAS/IML Support Community, there was an interesting discussion about how to simulate data from a truncated Poisson distribution. The SAS/IML user wanted to generate values from a Poisson distribution, but discard any zeros that are generated. This kind of simulation is known as an
I was recently asked, "Does SAS support computing inverse hyperbolic trigonometric functions?" I was pretty sure that I had used the inverse hyperbolic trig functions in SAS, so I was surprised when I read the next sentence: "I ask because I saw a Usage Note that says these functions are
The other day I was constructing covariance matrices for simulating data for a mixed model with repeated measurements. I was using the SAS/IML BLOCK function to build up the "R-side" covariance matrix from smaller blocks. The matrix I was constructing was block-diagonal and looked like this: The matrix represents a
I recently encountered a SUGI30 paper by Chuck Kincaid entitled "Guidelines for Selecting the Covariance Structure in Mixed Model Analysis." I think Kincaid does a good job of describing some common covariance structures that are used in mixed models. One of the many uses for SAS/IML is as a language
The determinant of a matrix arises in many statistical computations, such as in estimating parameters that fit a distribution to multivariate data. For example, if you are using a log-likelihood function to fit a multivariate normal distribution, the formula for the log-likelihood involves the expression log(det(Σ)), where Σ is the
I was looking at some SAS documentation when I saw a Base SAS function that I never knew existed. The NWKDOM function returns the date for the nth occurrence of a weekday for the specified month and year. I surely could have used that function last spring when I blogged
There are a lot of useful probability distributions that are not featured in standard statistical textbooks. Some of them have distinctive names. In the past year I have had contact with SAS customers who use the Tweedie distribution, the slash distribution, and the PERT distribution. Often these distributions are used
What's in a name? As Shakespeare's Juliet said, "That which we call a rose / By any other name would smell as sweet." A similar statement holds true for the names of colors in SAS: "Rose" by any other name would look as red! SAS enables you to specify a
Sometimes a graph is more interpretable if you assign specific colors to categories. For example, if you are graphing the number of Olympic medals won by various countries at the 2012 London Olympics, you might want to assign the colors gold, silver, and bronze to represent first-, second-, and third-place
The New York Times has an excellent staff that produces visually interesting graphics for the general public. However, because their graphs need to be understood by all Times readers, the staff sometimes creates a complicated infographic when a simpler statistical graph would show the data in a clearer manner. A
Last week I wrote a SAS/IML program that computes the odds of winning the game of craps. I noted that the program remains valid even if the dice are not fair. For convenience, here is a SAS/IML function that computes the probability of winning at craps, given the probability vector
It is easy to simulate data that is uniformly distributed in the unit cube for any dimension. However, it is less obvious how to generate data in the unit simplex. The simplex is the set of points (x1,x2,...,xd) such that Σi xi = 1 and 0 ≤ xi ≤ 1
Gambling games that use dice, such as the game of "craps," are often used to demonstrate the laws of probability. For two dice, the possible rolls and probability of each roll are usually represented by a matrix. Consequently, the SAS/IML language makes it easy to compute the probabilities of various
John D. Cook posted a story about Hardy, Ramanujan, and Euler and discusses a conjecture in number theory from 1937. Cook says, Euler discovered 635,318,657 = 158^4 + 59^4 = 134^4 + 133^4 and that this was the smallest [integer]known to be the sum of two fourth powers in two
Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset
I was recently flipping through Ross' Simulation (2006, 4th Edition) and saw the following exercise: Let N be the minimum number of draws from a uniform distribution [until the sum of the variates]exceeds 1. What is the expected value of N? Write a simulation to estimate the expected value. For
Sometimes it is useful to group observations based on the values of some variable. Common schemes for grouping include binning and using quantiles. In the binning approach, a variable is divided into k equal intervals, called bins, and each observation is assigned to a bin. In this scheme, the size