When comparing scores from different subjects, it is often useful to rank the subjects. A rank is the order of a subject when the associated score is listed in ascending order. I've written a few articles about the importance of including confidence intervals when you display rankings, but I haven't
Author
In my article on computing confidence intervals for rankings, I had to generate p random vectors that each contained N random numbers. Each vector was generated from normal distribution with different parameters. This post compares two different ways to generate p vectors that are sampled from independent normal distributions. Sampling
Editor's Note: This article was an April Fool's prank from 2011. The entire article is fake. Today, SAS, the leader in business analytics announces significant changes to two popular SAS blogs, The DO Loop (written by Rick Wicklin) and The SAS Dummy (previously written by Chris Hemedinger). The two blogs
This week, I posted the 100th article to The DO Loop. To celebrate, I'm going to analyze the content of my first 100 articles. In December 2010, I compiled a list of The DO Loop's most-read posts, so I won't repeat that exercise. Instead, I thought it would be interesting
In a previous post, I described how to compute means and standard errors for data that I want to rank. The example data (which are available for download) are mean daily delays for 20 US airlines in 2007. The previous post carried out steps 1 and 2 of the method
When you create a character matrix in SAS/IML software, the initial values determine the number of characters that can fit into any element of the matrix. For example, the following statements define a 1x3 character matrix: proc iml; m = {"Low" "Med" "High"}; After the matrix is defined, at most
I recently posted an article about representing uncertainty in rankings on the blog of the ASA Section for Statistical Programmers and Analysts (SSPA). The posting discusses the importance of including confidence intervals or other indicators of uncertainty when you display rankings. Today's article complements the SSPA post by showing how
I recently blogged about how to eliminate a macro loop in favor of using SAS/IML language statements. The purpose of the program was to extract N 3x3 matrices from a big 3Nx3 matrix. The main portion of my PROC IML program looked something like this: proc iml; ... do i=0
Statistical programmers can be creative and innovative. But when it comes to choosing names of variables, often x1, x2, x3,... works as well as any other choice. In this blog post, I have two tips that are related to constructing variable names of the form x1, x2,..., xn. Both tips
Loony. Zany. Brilliant. Hysterical. Those are some of the adjectives I use to describe The Far Side® cartoons by Gary Larson from the 1980s and early '90s. I recently rediscovered an old book, The Far Side Gallery 2, which collects some of the best of Larson's wonderfully wacky cartoons. Every
In a previous blog post, I showed how to use the SAS/IML SORT and SORTNDX subroutines to sort rows of a matrix according to the values of one or more columns. There is another common situation in which you might need to sort a matrix: you compute a statistic for
Sorting is a fundamental operation in statistical programming, and most SAS programmers are familiar with PROC SORT for sorting data sets. But did you know that you can also sort rows of a SAS/IML matrix according to the value of one or more columns? This post shows how. Sorting a
In my spare time, I enjoy browsing the StackOverflow discussion forum to see what questions people are asking about SAS, SAS/IML, and statistics. Last week, a statistics student asked for help with the following homework problem: I need to generate a one-dimensional random walk in which the step length and
Several times a year, I am contacted by a SAS account manager who tells me that a customer has asked whether it is possible to convert a MATLAB program to the SAS/IML language. Often the customer has an existing MATLAB program and wants to include the computation as part of
String comparisons in SAS software are case-sensitive. For example, the uppercase letter "F" and lowercase letter "f" are treated as unique characters. When these two letters represent the same condition (for example, a female patient), the strings need to be handled in a case-insensitive manner, and a SAS programmer might
Do you have many points in your scatter plots that overlap each other? If so, your graph exhibits overplotting. Overplotting occurs when many points have similar coordinates. For example, the following scatter plot (which is produced by using the ODS statistical graphics procedure, SGPLOT) displays 12,000 points, many of which
I don't use the SAS macro language very often. Because the SAS/IML language has statements for looping and evaluating expressions, I rarely write a macro function as part of a SAS/IML programs. Oh, sure, I use the %LET statement to define global constants, but I seldom use the %DO and
If you haven't signed up for SAS Global Forum 2011 in Las Vegas, you'd better get moving: February 28 is the last day for early registration and the discounted hotel prices. You should also sign up for the pre-conference statistical tutorials, which are filling up fast! I was tempted to
I was inspired by Chris Hemedinger's blog posts about his daughter's science fair project. Explaining statistics to a pre-teenager can be a humbling experience. My 11-year-old son likes science. He recently set about trying to measure which of three projectile launchers is the most accurate. I think he wanted to
I don't know much about the SQL procedure, but I know that it is powerful. According to the SAS documentation for the SQL procedure, "PROC SQL can perform some of the operations that are provided by the DATA step and the PRINT, SORT, and SUMMARY procedures." Recently, a fellow blogger,
If you are a statistical programmer, sooner or later you have to compute a confidence interval. In the SAS/IML language, some beginning programmers struggle with forming a confidence interval. I don't mean that they struggle with the statistics (they know how to compute the relevant quantities), I mean that they
The Flowing Data blog posted some data about how much TV actors get paid per episode. About a dozen folks have created various visualizations of the data (see the comments in the Flowing Data blog), several of them very glitzy and fancy. One variable in the data is a categorical
Suppose that you want to create a matrix in SAS/IML software that has a special structure, such as a tridiagonal matrix. How do you do it? Or suppose that you want to find elements of a matrix A such that A[i,j] satisfies a certain condition. How do you get the
If you tell my wife that she's married to a statistical geek, she'll nod knowingly. She is used to hearing sweet words of affection such as You are more beautiful than Euler's identity. or My love for you is like the exponential function: increasing, unbounded, and transcendental. But those are
In a previous blog post, I described the rules for a tic-tac-toe scratch-off lottery game and showed that it is a bad idea to generate the game tickets by using a scheme that uses equal probabilities. Instead, cells that yield large cash awards must be assigned a small probability of
Because of this week's story about a geostatistician, Mohan Srivastava, who figured out how predict winning tickets in a scratch-off lottery, I've been thinking about scratch-off games. He discovered how to predict winners when he began to "wonder how they make these [games]." Each ticket has a set of "lucky
I enjoyed the Dataists' data-driven blog on the best numbers to choose in a Super Bowl betting pool. It reminded me of my recent investigation of which initials are most common. Because the Dataists' blog featured an R function that converts Arabic numerals into Roman numerals, the blog post also
The other day, someone asked me how to compute a matrix of pairwise differences for a vector of values. The person asking the question was using SQL to do the computation for 2,000 data points, and it was taking many hours to compute the pairwise differences. He asked if SAS/IML
On Friday, I posted an article about using spatial statistics to detect whether a pattern of points is truly random. That day, one of my colleagues asked me whether there are any practical applications of detecting spatial randomness or non-randomness. "Oh, sure," I replied, and rattled off a list of
When you pass a matrix as an parameter (argument) to a SAS/IML module, the SAS/IML language does not create a copy of the matrix. That approach, known as "calling by value," is inefficient. It is well-known that languages that implement call-by-value semantics suffer performance penalties. In the SAS/IML language, matrices