More than a month ago I wrote a first article in response to an interesting article by Charlie H. titled Top 10 most powerful functions for PROC SQL. In that article I described SAS/IML equivalents to the MONOTONIC, COUNT, N, FREQ, and NMISS Functions in PROC SQL. In this article,
Author
The most common way to read observations from a SAS data set into SAS/IML matrices is to read all of the data at once by using the ALL clause in the READ statement. However, the READ statement also has options that do not require holding all of the observations in
In last week's article on how to create a funnel plot in SAS, I wrote the following comment: I have not adjusted the control limits for multiple comparisons. I am doing nine comparisons of individual means to the overall mean, but the limits are based on the assumption that I'm
The log transformation is one of the most useful transformations in data analysis. It is used as a transformation to normality and as a variance stabilizing transformation. A log transformation is often used as part of exploratory data analysis in order to visualize (and later model) data that ranges over
One of the advantages of programming in the SAS/IML language is its ability to transform data vectors with a single statement. For example, in data analysis, the log and square-root functions are often used to transform data so that the transformed data have approximate normality. The following SAS/IML statements create
Last week I showed how to create a funnel plot in SAS. A funnel plot enables you to compare the mean values (or rates, or proportions) of many groups to some other value. The group means are often compared to the overall mean, but they could also be compared to
Last week I presented the GSR algorithm, a statistical model of a riffle shuffle. In the model, a deck of n cards is split into two parts according to the binomial distribution. Each piece has roughly n/2 cards. Then cards are dropped from the two stacks according to the number
In a previous post, I showed how to read data from a SAS data set into SAS/IML matrices or vectors. This article shows the converse: how to use the CREATE, APPEND, and CLOSE statements to create a SAS data set from data stored in a matrix or in vectors. Creating
In a previous blog post, I showed how you can use simulation to construct confidence intervals for ranks. This idea (from a paper by E. Marshall and D. Spiegelhalter), enables you to display a graph that compares the performance of several institutions, where "institutions" can mean schools, companies, airlines, or
Last week I was a SAS consultant. Oh, not a real consultant, but for two hours in the Support and Demo room I stood under the "Analytics" sign and in front of rollshades about SAS/STAT, SAS/QC, and SAS/IML. Customers can walk up and ask any question they want. And ask
I recently returned from a five-day conference in Las Vegas. On the way there, I finally had time to read a classic statistical paper: Bayer and Diaconis (1992) describes how many shuffles are needed to randomize a deck of cards. Their famous result that it takes seven shuffles to randomize
"Convergence after 23 iterations to (1.23, 4.56)." That's the message that I want to print at the end of a program. The problem, of course, is that when I write the program, I don't know how many iterations an algorithm requires nor the value to which an algorithm converges. How
At the beginning of 2011, I heard about the Dow Piano, which was created by CNNMoney.com. The Dow Piano visualizes the performance of the Dow Jones industrial average in 2010 with a line plot, but also adds an auditory component. As Bård Edlund, Art Director at CNNMoney.com, said, The daily
In a previous blog post about computing confidence intervals for rankings, I inadvertently used the VAR function in SAS/IML 9.22, without providing equivalent functionality for those readers who are running an earlier version of SAS/IML software. (Thanks to Eric for pointing this out.) If you are using a version of
When comparing scores from different subjects, it is often useful to rank the subjects. A rank is the order of a subject when the associated score is listed in ascending order. I've written a few articles about the importance of including confidence intervals when you display rankings, but I haven't
In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling
Editor's Note: This article was an April Fool's prank from 2011. The entire article is fake. Today, SAS, the leader in business analytics announces significant changes to two popular SAS blogs, The DO Loop (written by Rick Wicklin) and The SAS Dummy (previously written by Chris Hemedinger). The two blogs
This week, I posted the 100th article to The DO Loop. To celebrate, I'm going to analyze the content of my first 100 articles. In December 2010, I compiled a list of The DO Loop's most-read posts, so I won't repeat that exercise. Instead, I thought it would be interesting
In a previous post, I described how to compute means and standard errors for data that I want to rank. The example data (which are available for download) are mean daily delays for 20 US airlines in 2007. The previous post carried out steps 1 and 2 of the method
When you create a character matrix in SAS/IML software, the initial values determine the number of characters that can fit into any element of the matrix. For example, the following statements define a 1x3 character matrix: proc iml; m = {"Low" "Med" "High"}; After the matrix is defined, at most
I recently posted an article about representing uncertainty in rankings on the blog of the ASA Section for Statistical Programmers and Analysts (SSPA). The posting discusses the importance of including confidence intervals or other indicators of uncertainty when you display rankings. Today's article complements the SSPA post by showing how
I recently blogged about how to eliminate a macro loop in favor of using SAS/IML language statements. The purpose of the program was to extract N 3x3 matrices from a big 3Nx3 matrix. The main portion of my PROC IML program looked something like this: proc iml; ... do i=0
Statistical programmers can be creative and innovative. But when it comes to choosing names of variables, often x1, x2, x3,... works as well as any other choice. In this blog post, I have two tips that are related to constructing variable names of the form x1, x2,..., xn. Both tips
Loony. Zany. Brilliant. Hysterical. Those are some of the adjectives I use to describe The Far Side® cartoons by Gary Larson from the 1980s and early '90s. I recently rediscovered an old book, The Far Side Gallery 2, which collects some of the best of Larson's wonderfully wacky cartoons. Every
In a previous blog post, I showed how to use the SAS/IML SORT and SORTNDX subroutines to sort rows of a matrix according to the values of one or more columns. There is another common situation in which you might need to sort a matrix: you compute a statistic for
Sorting is a fundamental operation in statistical programming, and most SAS programmers are familiar with PROC SORT for sorting data sets. But did you know that you can also sort rows of a SAS/IML matrix according to the value of one or more columns? This post shows how. Sorting a
In my spare time, I enjoy browsing the StackOverflow discussion forum to see what questions people are asking about SAS, SAS/IML, and statistics. Last week, a statistics student asked for help with the following homework problem: I need to generate a one-dimensional random walk in which the step length and
Several times a year, I am contacted by a SAS account manager who tells me that a customer has asked whether it is possible to convert a MATLAB program to the SAS/IML language. Often the customer has an existing MATLAB program and wants to include the computation as part of
String comparisons in SAS software are case-sensitive. For example, the uppercase letter "F" and lowercase letter "f" are treated as unique characters. When these two letters represent the same condition (for example, a female patient), the strings need to be handled in a case-insensitive manner, and a SAS programmer might
Do you have many points in your scatter plots that overlap each other? If so, your graph exhibits overplotting. Overplotting occurs when many points have similar coordinates. For example, the following scatter plot (which is produced by using the ODS statistical graphics procedure, SGPLOT) displays 12,000 points, many of which