Sometimes a graph is more interpretable if you assign specific colors to categories. For example, if you are graphing the number of Olympic medals won by various countries at the 2012 London Olympics, you might want to assign the colors gold, silver, and bronze to represent first-, second-, and third-place
Author
The New York Times has an excellent staff that produces visually interesting graphics for the general public. However, because their graphs need to be understood by all Times readers, the staff sometimes creates a complicated infographic when a simpler statistical graph would show the data in a clearer manner. A
Last week I wrote a SAS/IML program that computes the odds of winning the game of craps. I noted that the program remains valid even if the dice are not fair. For convenience, here is a SAS/IML function that computes the probability of winning at craps, given the probability vector
It is easy to simulate data that is uniformly distributed in the unit cube for any dimension. However, it is less obvious how to generate data in the unit simplex. The simplex is the set of points (x1,x2,...,xd) such that Σi xi = 1 and 0 ≤ xi ≤ 1
Gambling games that use dice, such as the game of "craps," are often used to demonstrate the laws of probability. For two dice, the possible rolls and probability of each roll are usually represented by a matrix. Consequently, the SAS/IML language makes it easy to compute the probabilities of various
John D. Cook posted a story about Hardy, Ramanujan, and Euler and discusses a conjecture in number theory from 1937. Cook says, Euler discovered 635,318,657 = 158^4 + 59^4 = 134^4 + 133^4 and that this was the smallest [integer]known to be the sum of two fourth powers in two
Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset
I was recently flipping through Ross' Simulation (2006, 4th Edition) and saw the following exercise: Let N be the minimum number of draws from a uniform distribution [until the sum of the variates]exceeds 1. What is the expected value of N? Write a simulation to estimate the expected value. For
Sometimes it is useful to group observations based on the values of some variable. Common schemes for grouping include binning and using quantiles. In the binning approach, a variable is divided into k equal intervals, called bins, and each observation is assigned to a bin. In this scheme, the size
With the US presidential election looming, all eyes are on the Electoral College. In the presidential election, each state gets as many votes in the Electoral College as it has representatives in both congressional houses. (The District of Columbia also gets three electors.) Because every state has two senators, it
If you use a word three times, it's yours. -Unknown When I was a child, my mother used to encourage me to increase my vocabulary by saying, "If you use a word three times, it's yours for life." I believe that the same saying holds for programming techniques: Use a
This article is an excerpt from my forthcoming book Simulating Data with SAS. Not every matrix with 1 on the diagonal and off-diagonal elements in the range [–1, 1] is a valid correlation matrix. A correlation matrix has a special property known as positive semidefiniteness. All correlation matrices are positive
Robert Allison posted a map that shows the average commute times for major US cities, along with the proportion of the commute that is attributed to traffic jams and other congestion. The data are from a CEOs for Cities report (Driven Apart, 2010, p. 45). Robert use SAS/GRAPH software to
Ah! The joys of sets! It is easy to test whether two vectors are equal in SAS/IML software. It is only slightly more challenging to test whether two sets are equal. Recall that A and B are equal as sets if they contain the same elements. Order does not matter.
I needed to construct a string to use in the title of a scatter plot. The scatter plot showed a line, and I wanted to include the equation of the line in the plot's title. This article shows how to construct a string that contains the equation in a readable
Magic squares are cool. Algorithms that create magic squares are even cooler. You probably remember magic squares from your childhood: they are n x n matrices that contain the numbers 1,2,...,n2 and for which the row sum, column sum, and the sum of both diagonals are the same value. There are many
When I studied math in school, I learned that the expression a (mod n) is always an integer between 0 and q – 1 for integer values of a and q. It's a nice convention, but SAS and many other computer languages allow the result to be negative if a (or q) is
The other day I was using PROC SGPLOT to create a box plot and I ran a program that was similar to the following: proc sgplot data=sashelp.cars; title "Box Plot: Category = Origin"; vbox Horsepower / category=origin; run; An hour or so later I had a need for another box
The SAS/IML language supports user-defined functions (also called modules). Many SAS/IML programmers know that you can use the RETURN function to return a value from a user-defined function. For example, the following function returns the sum of each column of matrix: proc iml; start ColSum(M); return( M[+, ] ); /*
It is common to want to extract the lower or upper triangular elements of a matrix. For example, if you have a correlation matrix, the lower triangular elements are the nontrivial correlations between variables in your data. As I've written before, you can use the VECH function to extract the
Sometimes a small option can make a big difference. Last week I thought to myself, "I wish there were an option that prevents variable labels from appearing in a table or graph." Well, it turns out that there is! I was using PROC MEANS to display some summary statistics, and
I've seen analyses of Fisher's iris data so often that sometimes I feel like I can smell the flowers' scent. However, yesterday I stumbled upon an analysis that I hadn't seen before. The typical analysis is shown in the documentation for the CANDISC procedure in the SAS/STAT documentation. A (canonical)
A comment to last week's article on "How to get data values out of ODS graphics" indicated that the technique would be useful for changing the title on an ODS graph "without messing around with GTL." You can certainly use the technique for that purpose, but if you want to
Many SAS procedures can produce ODS statistical graphics as naturally as they produce tables. Did you know that it is possible to obtain the numbers underlying an ODS statistical graph? This post shows how. Suppose that a SAS procedure creates a graph that displays a curve and that you want
I received the following question: In the DATA step I always use the ** operator to raise a values to a power, like this: x**2. But on your blog I you use the ## operator to raise values to a power in SAS/IML programs. Does SAS/IML not support the **
Last week I wrote an article in which I pointed out that many SAS programmers write a simulation in SAS by writing a macro loop. This approach is extremely inefficient, so I presented a more efficient technique. Not only is the macro loop approach slow, but there are other undesirable
Over the past few years, and especially since I posted my article on eight tips to make your simulation run faster, I have received many emails (often with attached SAS programs) from SAS users who ask for advice about how to speed up their simulation code. For this reason, I
I have blogged about three different SAS/IML techniques that iterate over categories and process the observations in each category. The three techniques are as follows: Use a WHERE clause on the READ statement to read only the observations in the ith category. This is described in the article "BY-group processing
When you are working with probability distributions (normal, Poisson, exponential, and so forth), there are four essential functions that a statistical programmer needs. As I've written before, for common univariate distributions, SAS provides the following functions: the PDF function, which returns the probability density at a given point the CDF
Suppose that you have two data vectors, x and y, with the same number of elements. How can you rearrange the values of y so that they have the same relative order as the values of x? In other words, find a permutation, π, of the elements of y so