At the Joint Statistical Meetings in San Diego this week, I attended a panel on The Evolution of Statistical Graphics and Visual Analytics.
There was a lot of discussion about what the graph represents (a projection of a non-parametric relationship) and how viewers might interpret it (they may initially think time is on the X axis), but I was disappointed that there wasn't an exploration of how the author, Hannah Fairfield, may have arrived at this representation.
I wanted to explore this data to see what other views of the relationships represented above look like.
I wasn't able to get the original source data, so I had to synthesize the data from the graph. As a result, my data may not line up exactly with the actual values, but it's close enough to see what the graphs show. You can download my JMP data table from the File Exchange.
Usually, when looking at data with a time element, our first instinct is to put the time value on the X axis and the measures on the Y. Since we have two measures, miles driven and price of gas with very different scales, I initially thought that a graph with a dual Y axis would be a good way to compare the relationship between these measures over time.
I'm not really happy with this, though. The overlapping lines don't really tell the story. I could probably adjust the scales of the Y axes to provide some separation, but if I'm going to do that, I might as well show two graphs with a common X axis.
That's a little better. It's easier to see the two trends, but I'm not really seeing the relationship between miles driven and the price of gas. Perhaps I should graph these two variables directly.
During the panel, Antony Unwin thought it curious that the "response," miles driven, wasn't shown on Y axis in The New York Times graphic. As I began to explore the relationship, I thought of miles driven as the response as well and put it on the Y axis. I also added a smoother to help the pattern in this data.
Now, we're getting somewhere! The smoother shows that there's a relationship between the price of gas and the number of miles driven, but it's awfully noisy, and there are clearly two groups of points in this data. Also, this graph leaves out the time variable, so we're losing some potentially illuminating information.
What if I connect the points in time order?
Now, because I'm getting fairly familiar with this data, I'm able to see the time series trend moving upward along the Y axis, but I suspect that a casual observer won't spot that very quickly since we're not used to seeing time move vertically in a graph.
So, I should swap the X and Y axes, and that gets me to the original graph from The New York Times.
This is interesting data because there are two time series trends: Miles driven has generally been increasing over time, and the price of gas has mostly been falling over time with a few noticeable exceptional periods. Trying to visualize both of these trends and see the exceptions is the key to this graph.
A colleague who attended the panel discussion with me suggested that an animated bubble plot was an obvious choice for this data. I don't usually like animation when I only have one series of data over time since it just delays my ability to see the whole series. I've created that bubble plot below. What do you think?