Creating heat maps in SAS/IML

In a previous blog post, I showed how to use the graph template language (GTL) in SAS to create heat maps with a continuous color ramp. SAS/IML 13.1 includes the HEATMAPCONT subroutine, which makes it easy to create heat maps with continuous color ramps from SAS/IML matrices. Typical usage includes creating heat maps of data matrices, correlation matrices, or covariance matrices. A more advanced usage is using matrices to visualize the results of computational algorithms or computer experiments.

Data matrices

Heat maps provide a convenient way to visualize many variables and/or many variables in a single glance. For example, suppose that you want to compare and contrast the 24 trucks in the Sashelp.Cars data by using the 10 numerical variables in the data set. The following SAS/IML statements read the data into a matrix and sort the matrix according to the value of the Invoice variable (that is, by price). The Model variable, which identifies each truck, is sorted by using the same criterion:

proc iml;
use Sashelp.Cars where(type="Truck");
read all var _NUM_ into Trucks[c=varNames r=Model];
close Sashelp.Cars;
 
call sortndx(idx, Trucks[,"Invoice"]);     /* find indices that sort the data */
Trucks = Trucks[idx,]; Model = Model[idx]; /* sort the data and the models    */

The original variables are all measured on different scales. For example, the Invoice variable has a mean value of $23,000, whereas the EngineSize variable has a mean value of 4 liters. Most of the values in the matrix are less than 300. Consequently, a heat map of the raw values would not be illuminating: The price variables would be displayed as dark cells (high values) and the rest of the matrix would be almost uniformly light (small values, relative to the prices). Fortunately, the HEATMAPCONT subroutine supports the SCALE="Col" option to standardize each variable to have mean 0 and unit variance. With that option, a heat map reveals the high, middle, and low values of each variable:

call HeatmapCont(Trucks) scale="Col"; /* standardize cols */

The HEATMAPCONT output shows the following default values:

The default color ramp is the 'TwoColor' color ramp. With my ODS style, white is used to display small values and dark blue is used to display large values. The color ramp indicates that the standardized values are approximately in the range [-2, 3], which probably indicates that several variables have positive skewness.
The axes are labeled by using the row numbers of the 24 x 10 matrix.
The Y axes "points down." In other words, the heat map displays the matrix as it would be printed: the top of the heat map displays the first few rows and the bottom of the heat map displays the last few rows.

A problem with this initial heat map is that we don't know which trucks correspond to the rows, nor do we know which variables corresponds to the columns. You can use the XVALUES= and YVALUES= options to add that information to the heat map. Also, let's use a three-color color ramp and center the range of the color ramp so that white represents the mean value for each variable and red (blue) represents positive (negative) deviations from the mean. The resulting image is shown below (click to enlarge):

call HeatmapCont(Trucks) scale="Col"           /* standardize cols    */
     xvalues=varNames yvalues=Model            /* label rows and cols */
     colorramp="ThreeColor" range={-3 3}       /* color range [-3,3]  */
     legendtitle = "Standardized values" title="Truck Data";

This visualization of the data enables us to draw several conclusions about the data. In general, expensive cars are powerful (large values of EngineSize/, Cylinders, and Horsepower), large in size (large values of Weight, Wheelbase, and Length), and get poor fuel economy (small values of MPG_City and MPG_Highway). However, two vehicles seem unusual. Compared with similarly priced trucks, the Baja is smaller and more fuel efficient. Compared with similarly priced trucks, the Tundra is more powerful and larger in size.

Correlation matrices

Because the data matrix is sorted by price, it looks like price variables are positively correlated with all variables except for the fuel economy variables. Let's see if that is the case by computing the correlation matrix and displaying it as a heat map:

corr = corr(Trucks);
call HeatmapCont(corr)
     xvalues=varNames yvalues=varNames
     colorramp="ThreeColor" range={-1 1}
     title="Correlations in Truck Data";

The correlation matrix confirms our initial impression. The price variables are strongly correlated with the variables that indicate the power and size of the trucks. The price is negatively correlated with the fuel efficiency variables.

One of the advantages of computing correlation analysis in the SAS/IML language is that you can easily change the order of the variables. For example, the following statements move the fuel efficiency variables to the end, so that all of the positively correlated variables are grouped together:

newOrder = (1:5) || (8:10) || (6:7);
corr = corr[newOrder, newOrder];
varNames = varNames[newOrder];

Can you think of other applications of visualizing matrices by using the HEATMAPCONT subroutine? Leave a comment.

17 Comments

Michelle Homes on August 20, 2014 8:29 am

The HeatmapCont subroutine is a gem! Thanks for the blog post.

And I like how you can change the order of the variables in the correlation matrix...

Cheers,
Michelle

- Michelle Homes on September 5, 2014 7:36 am
  
  Also discovered your YouTube presentation... great examples and detail! https://www.youtube.com/watch?v=GgBOfVcz0vE&feature=em-subs_digest
  
Ian Wakeling on August 20, 2014 10:59 am

There is a neat way to change the order automatically, simply sort according to the first eigenvector of the correlation matrix. For example
.
newOrder = - eigvec(corr)[ , 1]`;
newOrder[ rank( newOrder ) ] = 1 : ncol(corr);
.
should have a similar effect to Rick's manual version. I think Michael Friendly used this method, and something else more complicated too, but memory fails me on the other method.

- Rick Wicklin on August 20, 2014 11:12 am
  
  For readers interested in learning more about permuting the order of variables in order to group "similar" variables, I recommend reading Hurley (2004), "Clustering Visualizations of Multidimensional Data," JCGS.
  
Pingback: The frequency of bigrams in an English corpus - The DO Loop
Pingback: Counting observations in two-dimensional bins - The DO Loop
Pingback: Creating discrete heat maps in SAS/IML - The DO Loop
Pingback: Video: Using heat maps to visualize matrices - The DO Loop
Philippe on November 3, 2014 2:57 pm

Hi,
I have a question concerning heatmap. I have the former version of SAS, and the commands presented here are not taken into account. Nevertheless I will use proc template with heatmapparm as well as proc sgrender. Suppose that on the X-axis I've got eleven variables, and on the Y-axis, I have dates, say around 1000. Is there's a way to control the bins so for a variable the colors appear to be continuous ? rather than dicrete ? If not, is there just a way to control the bins ?
Best
Philippe

- Rick Wicklin on November 3, 2014 3:39 pm
  
  Yes, you can control the colors. Use the RANGEATTRMAP statement to define the color ramp.
  Use the RANGEATTRMAP statement to associate the color ramp to a variable.
  On the HEATMAPPARM statement, use the COLORRESPONSE= option to specify the range variable. It will look something like this (untested):
  
  begingraph;
  rangeattrmap name='Custom';
  range -3-3 / rangecolormodel=(CXFFFFB2 CXFECC5C CXFD8D3C CXF03B20 CXBD0026);
  endrangeattrmap;
  rangeattrvar attrvar=rangevar var=Z attrmap='Custom';
  layout overlay / ...;
  heatmapparm x=X y=Y colorresponse=rangevar;
  endlayout;
  endgraph;
  
Pingback: Popular posts from The DO Loop in 2014 - The DO Loop
B. Hein on January 13, 2015 3:46 pm

This is great! BUT,...is there a way to add the actual values in the cells. I know my supervisor will want to see them.

- Rick Wicklin on January 14, 2015 6:40 am
  
  Yes, although I find that combining the two can result in a crowded graph. You could print the table of values in case he asks. If you decide that you want the labels displayed on the heat map, you need to write the GTL template. See the article "Designing a quantile bin plot." You can download the program that creates those plots. Although none of them are exactly what you need, they should get you started.
  
Pingback: Lasagna plots in SAS: When spaghetti plots don't suffice - The DO Loop
Pingback: How to build a correlations matrix heat map with SAS - The SAS Dummy
Pingback: Visualize a matrix in SAS by using a discrete heat map - The DO Loop
Pingback: Use a bar chart to visualize pairwise correlations - The DO Loop

Blogs

Blogs

Creating heat maps in SAS/IML

Data matrices

Correlation matrices

About Author

17 Comments

Leave A Reply Cancel Reply