In reference to a previous article on Violin Plots, a reader asked about creating comparative mirrored histograms to compare propensity scores. While I had my own understanding of "Mirrored Histograms", I also looked this up on the web. Google showed many cases of two histograms back to back, either horizontally or vertically. Given these examples, this needed more than a simple reply. This deserved an article by itself.
I created two graphs, one for each case, assuming the data has two columns. I used the sashelp.cars and sashelp.heart data sets. Click on the graphs for higher resolution images.
Horizontal Mirrored Histogram:
To create this graph, we use GTL to create a 2-cell graph using LAYOUT LATTICE. Full code is included in the attached file. The key steps are as follows:
- Use LAYOUT LATTICE to define a 2-column lattice. Use ORDER=ColumnMajor.
- Add a LAYOUT OVERLAY to define a histogram in each cell with horizontal orientation.
- Reverse the X axis for the plot in the first cell.
- Tick values and axis ranges are equalized for easier comparison of bin heights.
- A CELL statement block is added around each Layout Overlay to add the cell headers. This is to place the City and Highway labels inside each cell. This can also be done using simple ENTRY statements as done for the vertically mirrored case.
Vertical Mirrored Histogram:
This graph is also created using GTL LAYOUT LATTICE, with two rows. As mentioned above, one Layout Overlay is added to each cell, and the Y axis for the 2nd cell is reversed. Other options are set to create the histogram above.
In both these cases, the mirrored layout creates an interesting graph, but it is really not very easy to compare the heights of the bins as they are on reverse sides of the mirror. Also, only two variables can be compared this way.
Maybe a better way is to use overlaid histograms. Here are some examples:
This is a single-cell graph and can be created using the SGPLOT procedure. Note, the two histograms are made partially transparent to allow visibility of both. The bin start and intervals are equalized.
proc sgplot data=sashelp.heart; title 'Distribution of Blood Pressure'; histogram diastolic / binstart=50 binwidth=5 transparency=0.5; histogram systolic / binstart=50 binwidth=5 transparency=0.5; xaxis display=(nolabel); yaxis grid; keylegend / location=inside position=topright across=1; run;
In my opinion, comparison of the two histograms is much easier in this case, the shapes can be seen clearly, and the heights of the bins can be easily compared. Also, the code is very simple.
Another benefit is that this technique can easily be extended to multiple variables. In the example below I created a data set with three variables for comparison. Kernel density plots are added. Click on graph for high resolution image.
Here is a link to a previous article on Comparative Density Plots.
Full SAS 9.3 Code: ComparativeHistograms