Scatter plot matrix with a twist

4

The Scatter Plot Matrix statement supports a couple of different configurations.  The basic is the N x N panel of cells, with each cell showing scatter plots plot for a pair of variables at a time.  Here is an example of a 3 x 3 scatter plot matrix for the IRIS data set.

Proc SGSCATTER Code for 3 x 3 Matrix:

proc sgscatter data=sashelp.iris;
  label sepallength='SepalLength' petallength='PetalLength';
  label sepalwidth='SepalWidth' petalwidth='PetalWidth';
  matrix sepallength petallength sepalwidth;
  run;

The diagonal elements contain the variable names, and each row and column have the axis range shown alternatively on each side of the plot.  This plot type is useful to view not only the associations between the variables, but also keep the data ranges in context.

A second configuration draws a distribution plot like a histogram in the diagonal element for each variable.  Since the diagonal cells are used up, the variables for each plot are moved to the outside.  Here is an example of a 4 x 4 scatter plot matrix with histograms.

Proc SGSCATTER Code for 4 x 4 Matrix:

proc sgscatter data=sashelp.iris;
  label sepallength='SepalLength' petallength='PetalLength';
  label sepalwidth='SepalWidth' petalwidth='PetalWidth';
  matrix sepallength petallength sepalwidth petalwidth/ diagonal=(histogram);
  run;

In this configuration, the histograms are displayed in the diagonal cells, and the variable names or labels are shown along the side and top of the graph.  The axis ranges are not displayed as they apply only to the data axis, and not to the response axis of the histogram.

Recently, two different SAS users expressed interest in viewing a scatter plot matrix that showed both the histograms and the data ranges.  We sent them the suggestion shown below.  When Rick Wicklin also asked a related question,  I thought it was time to follow up with an example.  In my opinion, a good configuration would be to place the histograms along the top, so the analysis (X) variable of each histogram is common with the scatter plot X variable.  The response (percent) axis for all histograms can be made uniform.

Such a scatter plot matrix can be created using the Lattice layout container in GTL.  Here is such a graph:

The complete SAS 9.2 GTL program is included below.  This can easily be extended for higher order matrices and converted into a macro (for a future post).

Full SAS 9.2 Program:  MatrixTopHist

Share

About Author

Sanjay Matange

Director, R&D

Sanjay Matange is R&D Director in the Data Visualization Division responsible for the development and support of the ODS Graphics system, including the Graph Template Language (GTL), Statistical Graphics (SG) procedures, ODS Graphics Designer and related software. Sanjay has co-authored a book on SG Procedures with SAS/PRESS.

Related Posts

4 Comments

    • Dan Heath

      For that many variables, I wouldn't use a scatter plot matrix. I would write a macro that created "thumbnail" fit plots of the variable crossings (no scatter points or axes), laid out in a 50x50 grid. This grid gives you a "feel" for the data, much like a scatter plot matrix. The fits help you spot odd trends that you might want to investigate further. Hope this helps!

Back to Top