A SAS programmer asked for a list of SAS/IML functions that operate on the columns of an n x p matrix and return a 1 x p row vector of results. The functions that behave this way tend to compute univariate descriptive statistics such as the mean, median, standard deviation, and quantiles. The following
Author
I previously wrote about the best way to suppress output from SAS procedures. Suppressing output is necessary in simulation and bootstrap analyses, and it is useful in other contexts as well. In my previous article, I wrote, "many programmers use ODS _ALL_ CLOSE as a way to suppress output, but
SAS procedures can produce a lot of output, but you don't always want to see it all. In simulation and bootstrap studies, you might analyze 10,000 samples or resamples. Usually you are not interested in seeing the results of each analysis displayed on your computer screen. Instead, you want to
When I was an undergraduate physic major, my favorite professor would start each class with a joke or pun. One day he began class with a paraphrase of a famous quote from the movie Star Trek 4: The Voyage Home (the one with the whales). "Today," my professor said, imitating
North Carolina is a state that requires yearly inspections of motor vehicles. An inspection checks for safety features (lights, brakes, tires,....) as well as checking vehicle emissions to ensure that vehicles meet air pollution standards. I recently had a car inspected and noticed a pie chart on the inspection's summary
Perhaps you saw the headlines earlier this week about the fact that it has been nine years since the last major hurricane (category 3, 4, or 5) hit the US coast. According to a post on the GeoSpace blog, which is published by the American Geophysical Union (AGU), researchers ran
I always learn something new when I attend SAS Global Forum, and it often results from an informal conversations in The Quad or in a hallway. Last week a SAS customer described a scenario that he wanted to implement as part of an analysis of some genetic data. To oversimplify
A common task in data analysis is to locate observations that satisfy multiple criteria. For example, you might want to locate all zip codes in certain counties within specified states. The SAS DATA step contains the powerful WHERE statement, which enables you to extract a subset of data that satisfy
Last week I attended SAS Global Forum 2015 in Dallas. It was packed with almost 5,000 attendees. I learned many interesting things at the conference, including the fact that you need to arrive EARLY to the statistical presentations if you want to find an empty seat! It was gratifying to
Did you know that if you have set multiple titles in SAS, that there is an easy way to remove them? For example, suppose that you've written the following statements, which call the TITLE statement to set three titles: title "A Great Big Papa Title"; title2 "A Medium-sized Mama Title";
Suppose that you compute the correlation matrix (call it R1) for a set of variables x1, x2, ..., x8. For some reason, you later want to compute the correlation matrix for the variables in a different order, maybe x2, x1, x7,..., x6. Do you need to go back to the
Sometimes different communities use the same name for different objects. To a soldier, "boots" are rugged, heavy, high-top foot coverings. To a soccer (football) player, "boots" are lightweight cleats. So it is with the term "waterfall plot." To researchers in the medical field, a "waterfall plot" is a sorted bar
A customer asked: How do we go about summing a finite series in SAS? For example, I want to compute for various integers n ≥ 3. I want to output two columns, one for the natural numbers and one for the summation of the series. Summations arise often in statistical
In clinical trials, a waterfall plot is often used to indicate how patients in the study responded to treatment. In oncology trials, the response variable might be the percent change in the size of a tumor from the individual's baseline value at the start of the trial. The percent change
Last week I was chatting with some mathematicians and I mentioned the blog post that I wrote last year on the distribution of Pythagorean triples. In my previous article, I showed that there is an algorithm that uses matrix multiplication to generate every primitive Pythagorean triple by starting with the
Today is my 600th blog post for The DO Loop. I have written about many topics that are related to statistical programming, math, statistics, simulation, numerical analysis, matrix computations, and more. The right sidebar of my blog contains a tag cloud that links to many topics. What topics do you,
A common question from statistical programmers is how to compute the rank of a matrix in SAS. Recall that the rank of a matrix is defined as the number of linearly independent columns in the matrix. (Equivalently, the number of linearly independent rows.) This article describes how to compute the
The 2015 SAS Global Forum is in Dallas, TX, and I'll be there. There are many talks to see and people to meet, so thank goodness for the agenda builder, which enables you to create a schedule in advance. I always enjoy talking with SAS customers about statistics, simulations, matrix
The Monty Hall Problem is one of the most famous problems in elementary probability. It is famous because the correct solution is counter-intuitive and because it caused an uproar when it appeared in the "Ask Marilyn" column in Parade magazine in 1990. Discussing the problem has been known to create
There has been a spate of recent high-profile airline crashes (Malaysia Airlines, TransAsia Airways, Germanwings,...) so I was surprised when I saw a time series plot of the number of airline crashes by year, which indicates that the annual number of airline crashes has been decreasing since 1993. The data
There's "big," and then there is "factorial big." If you have k items, the number of permutations is "k factorial," which is written as k!. The factorial function gets big fast. For example, the value of k! for several values of k is shown in the following table. You can
The title of this article makes no sense. How can the number of elements (in fact, the number of anything!) not be a whole number? In fact, it can't. However, the title refers to the fact that you might compute a quantity that ought to be an integer, but is
Imagine that you have one million rows of numerical data and you want to determine if a particular "target" value occurs. How might you find where the value occurs? For univariate data, this is an easy problem. In the SAS DATA step you can use a WHERE clause or a
This article show how to run a SAS program in batch mode and send parameters into the program by specifying the parameters when you run SAS from a command line interface. This technique has many uses, one of which is to split a long-running SAS computation into a series of
Saturday, March 14, 2015, is Pi Day, and this year is a super-special Pi Day! This is your once-in-a-lifetime chance to celebrate the first 10 digits of pi (π) by doing something special on 3/14/15 at 9:26:53. Apologies to my European friends, but Pi Day requires that you represent dates
Sometimes I get contacted by SAS/IML programmers who discover that the SAS/IML language does not provide built-in support for multiplication of matrices that have missing values. (SAS/IML does support elementwise operations with missing values.) I usually respond by asking what they are trying to accomplish, because mathematically matrix multiplication with
I often blog about the usefulness of vectorization in the SAS/IML language. A one-sentence summary of vectorization is "execute a small number of statements that each analyze a lot of data." In general, for matrix languages (SAS/IML, MATLAB, R, ...) vectorization is more efficient than the alternative, which is to
In my previous post, I showed how to approximate a cumulative density function (CDF) by evaluating only the probability density function. The technique uses the trapezoidal rule of integration to approximate the CDF from the PDF. For common probability distributions, you can use the CDF function in Base SAS to
Evaluating a cumulative distribution function (CDF) can be an expensive operation. Each time you evaluate the CDF for a continuous probability distribution, the software has to perform a numerical integration. (Recall that the CDF at a point x is the integral under the probability density function (PDF) where x is
Last week I received a message from SAS Technical Support saying that a customer's IML program was running slowly. Could I look at it to see whether it could be improved? What I discovered is a good reminder about the importance of vectorizing user-defined modules. The program in this blog