Blogs

Blogs

Author

Rick Wicklin RSS
Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

Rick WicklinOctober 17, 2012 0

Specify the colors of groups in SAS statistical graphics

Sometimes a graph is more interpretable if you assign specific colors to categories. For example, if you are graphing the number of Olympic medals won by various countries at the 2012 London Olympics, you might want to assign the colors gold, silver, and bronze to represent first-, second-, and third-place

Read More

New York Times graphic

Rick WicklinOctober 15, 2012 0

Women and jobs: Redesigning a New York Times graphic

The New York Times has an excellent staff that produces visually interesting graphics for the general public. However, because their graphs need to be understood by all Times readers, the staff sometimes creates a complicated infographic when a simpler statistical graph would show the data in a clearer manner. A

Read More

Advanced Analytics

Rick WicklinOctober 10, 2012 0

Playing "craps" with unfair dice

Last week I wrote a SAS/IML program that computes the odds of winning the game of craps. I noted that the program remains valid even if the dice are not fair. For convenience, here is a SAS/IML function that computes the probability of winning at craps, given the probability vector

Read More

Rick WicklinOctober 8, 2012 0

Generate uniform data in a simplex

It is easy to simulate data that is uniformly distributed in the unit cube for any dimension. However, it is less obvious how to generate data in the unit simplex. The simplex is the set of points (x1,x2,...,xd) such that Σi xi = 1 and 0 ≤ xi ≤ 1

Read More

Advanced Analytics

Rick WicklinOctober 4, 2012 0

Dice probabilities and the game of "craps"

Gambling games that use dice, such as the game of "craps," are often used to demonstrate the laws of probability. For two dice, the possible rolls and probability of each roll are usually represented by a matrix. Consequently, the SAS/IML language makes it easy to compute the probabilities of various

Read More

Rick WicklinOctober 2, 2012 0

Open question in 1937...short SAS program today

John D. Cook posted a story about Hardy, Ramanujan, and Euler and discusses a conjecture in number theory from 1937. Cook says, Euler discovered 635,318,657 = 158^4 + 59^4 = 134^4 + 133^4 and that this was the smallest [integer]known to be the sum of two fourth powers in two

Read More

Rick WicklinOctober 1, 2012 0

Access rows or columns of a matrix by names

Did you know that you can index into SAS/IML matrices by using unique strings that you assign via the MATTRIB statement? The MATTRIB statement associates various attributes to a matrix. Usually, these attributes are only used for printing, but you can also use the ROWNAME= and COLNAME= attributes to subset

Read More

Advanced Analytics

Rick WicklinSeptember 26, 2012 0

A surprising result: The expected number of uniform variates whose sum exceeds one

I was recently flipping through Ross' Simulation (2006, 4th Edition) and saw the following exercise: Let N be the minimum number of draws from a uniform distribution [until the sum of the variates]exceeds 1. What is the expected value of N? Write a simulation to estimate the expected value. For

Read More

Advanced Analytics

Rick WicklinSeptember 24, 2012 0

Grouping observations based on quantiles

Sometimes it is useful to group observations based on the values of some variable. Common schemes for grouping include binning and using quantiles. In the binning approach, a variable is divided into k equal intervals, called bins, and each observation is assigned to a bin. In this scheme, the size

Read More

Rick WicklinSeptember 19, 2012 0

Visualizing congressional representation by state and time

With the US presidential election looming, all eyes are on the Electoral College. In the presidential election, each state gets as many votes in the Electoral College as it has representatives in both congressional houses. (The District of Columbia also gets three electors.) Because every state has two senators, it

Read More

Advanced Analytics

Rick WicklinSeptember 17, 2012 0

Filling the lower and upper triangular portions of a matrix

If you use a word three times, it's yours. -Unknown When I was a child, my mother used to encourage me to increase my vocabulary by saying, "If you use a word three times, it's yours for life." I believe that the same saying holds for programming techniques: Use a

Read More

Rick WicklinSeptember 12, 2012 0

When is a correlation matrix not a correlation matrix?

This article is an excerpt from my forthcoming book Simulating Data with SAS. Not every matrix with 1 on the diagonal and off-diagonal elements in the range [–1, 1] is a valid correlation matrix. A correlation matrix has a special property known as positive semidefiniteness. All correlation matrices are positive

Read More

Rick WicklinSeptember 10, 2012 0

Visualizing US commute times and congestion

Robert Allison posted a map that shows the average commute times for major US cities, along with the proportion of the commute that is attributed to traffic jams and other congestion. The data are from a CEOs for Cities report (Driven Apart, 2010, p. 45). Robert use SAS/GRAPH software to

Read More

Advanced Analytics

Rick WicklinSeptember 6, 2012 0

Testing for equality of sets

Ah! The joys of sets! It is easy to test whether two vectors are equal in SAS/IML software. It is only slightly more challenging to test whether two sets are equal. Recall that A and B are equal as sets if they contain the same elements. Order does not matter.

Read More

Rick WicklinSeptember 4, 2012 0

Construct the equation of a line: An exercise in string concatenation

I needed to construct a string to use in the title of a scatter plot. The scatter plot showed a line, and I wanted to include the equation of the line in the plot's title. This article shows how to construct a string that contains the equation in a readable

Read More

Rick WicklinAugust 29, 2012 0

Construct a magic square of any size

Magic squares are cool. Algorithms that create magic squares are even cooler. You probably remember magic squares from your childhood: they are n x n matrices that contain the numbers 1,2,...,n2 and for which the row sum, column sum, and the sum of both diagonals are the same value. There are many

Read More

Rick WicklinAugust 27, 2012 0

The MOD function and negative values

When I studied math in school, I learned that the expression a (mod n) is always an integer between 0 and q – 1 for integer values of a and q. It's a nice convention, but SAS and many other computer languages allow the result to be negative if a (or q) is

Read More

Rick WicklinAugust 22, 2012 0

What is the difference between categories and groups in PROC SGPLOT?

The other day I was using PROC SGPLOT to create a box plot and I ran a program that was similar to the following: proc sgplot data=sashelp.cars; title "Box Plot: Category = Origin"; vbox Horsepower / category=origin; run; An hour or so later I had a need for another box

Read More

Advanced Analytics

Rick WicklinAugust 20, 2012 0

How to return multiple values from a SAS/IML function

The SAS/IML language supports user-defined functions (also called modules). Many SAS/IML programmers know that you can use the RETURN function to return a value from a user-defined function. For example, the following function returns the sum of each column of matrix: proc iml; start ColSum(M); return( M[+, ] ); /*

Read More

Advanced Analytics

Rick WicklinAugust 16, 2012 0

Extract the lower triangular elements of a matrix

It is common to want to extract the lower or upper triangular elements of a matrix. For example, if you have a correlation matrix, the lower triangular elements are the nontrivial correlations between variables in your data. As I've written before, you can use the VECH function to extract the

Read More

Rick WicklinAugust 13, 2012 0

Suppress variable labels in SAS procedures

Sometimes a small option can make a big difference. Last week I thought to myself, "I wish there were an option that prevents variable labels from appearing in a table or graph." Well, it turns out that there is! I was using PROC MEANS to display some summary statistics, and

Read More

Rick WicklinAugust 9, 2012 0

Discriminating Fisher's iris data by using the petal areas

I've seen analyses of Fisher's iris data so often that sometimes I feel like I can smell the flowers' scent. However, yesterday I stumbled upon an analysis that I hadn't seen before. The typical analysis is shown in the documentation for the CANDISC procedure in the SAS/STAT documentation. A (canonical)

Read More

Rick WicklinAugust 6, 2012 0

Change a plot title by using the ODS Graphics Editor

A comment to last week's article on "How to get data values out of ODS graphics" indicated that the technique would be useful for changing the title on an ODS graph "without messing around with GTL." You can certainly use the technique for that purpose, but if you want to

Read More

Rick WicklinAugust 1, 2012 0

How to get data values out of ODS graphics

Many SAS procedures can produce ODS statistical graphics as naturally as they produce tables. Did you know that it is possible to obtain the numbers underlying an ODS statistical graph? This post shows how. Suppose that a SAS procedure creates a graph that displays a curve and that you want

Read More

Rick WicklinJuly 30, 2012 0

The power operators: Powers of matrices and matrix elements

I received the following question: In the DATA step I always use the ** operator to raise a values to a power, like this: x**2. But on your blog I you use the ## operator to raise values to a power in SAS/IML programs. Does SAS/IML not support the **

Read More

Rick WicklinJuly 25, 2012 0

Using macro loops for simulation

Last week I wrote an article in which I pointed out that many SAS programmers write a simulation in SAS by writing a macro loop. This approach is extremely inefficient, so I presented a more efficient technique. Not only is the macro loop approach slow, but there are other undesirable

Read More

Rick WicklinJuly 18, 2012 0

Simulation in SAS: The slow way or the BY way

Over the past few years, and especially since I posted my article on eight tips to make your simulation run faster, I have received many emails (often with attached SAS programs) from SAS users who ask for advice about how to speed up their simulation code. For this reason, I

Read More

Rick WicklinJuly 16, 2012 0

Indexing a SAS data set to improve processing categories in SAS/IML

I have blogged about three different SAS/IML techniques that iterate over categories and process the observations in each category. The three techniques are as follows: Use a WHERE clause on the READ statement to read only the observations in the ith category. This is described in the article "BY-group processing

Read More

Advanced Analytics

Rick WicklinJuly 11, 2012 0

Visualize the bivariate normal cumulative distribution

When you are working with probability distributions (normal, Poisson, exponential, and so forth), there are four essential functions that a statistical programmer needs. As I've written before, for common univariate distributions, SAS provides the following functions: the PDF function, which returns the probability density at a given point the CDF

Read More

Advanced Analytics

Rick WicklinJuly 9, 2012 0

Reordering data to match a target order

Suppose that you have two data vectors, x and y, with the same number of elements. How can you rearrange the values of y so that they have the same relative order as the values of x? In other words, find a permutation, π, of the elements of y so

Read More

Previous 1 … 38 39 40 41 42 … 50 Next