I saw an interesting graph on dadaviz.com that claimed Italians had gone from drinking twice as much as Americans in 1970, to less than Americans in recent years. The data analyst in me just had to "independently verify" this factoid ...
But before I get into the technical part of this blog, I want to show the results of a little 'contest'. I asked my friends to submit their pictures they thought would best go with this blog. There were a *lot* of great entries! And the winner is Stuart - it appears he single-handedly brings up the per-capita average wherever he drinks!
And now, on with the blog! Here's a snapshot of the original graph I saw on dadaviz.com. It is a visually dramatic graph, and makes it seem like alcohol consumption in several countries has dropped like a rock since the 1970s. But is it a true representation of the data?...
As I looked more closely, a couple of things made me distrust their graph ... I thought it odd that the y-axis started at 5 instead of zero, and that they would pick this proportion (taller than wide) for a time-series graph. Both of these techniques can be used to create deceptive graphs.
So I set about making my own SAS version of the graph to correct those two problems, and also make a few other improvements. I found the data on the oedc.org website and downloaded it in Excel spreadsheet format. I imported the spreadsheet into a SAS dataset, transposed it (so the years were values rather than columns), and then plotted it with Proc Gplot. Since this is a long time series, I made the proportions much wider than tall (specifying goptions xpixels and ypixels). I started the y-axis at zero (using the order= option on an axis statement). I placed the country labels at the end of each line (using annotated text labels). And I included an extra y-axis on the right of the graph, so you don't have to look way to the left axis to see the values (using a plot2 statement). Here's how my final graph came out - I think it's very nice looking, and represents the data well. (click the image below to see the full-size graph):
But there was one thing still lacking, from an analytic point of view. The spreadsheet also contained data for several other countries, so I thought it would be useful to show them too (as light gray lines in the background). This allows you to see the 5 countries 'in context' with the other countries. It's not as pretty a graph, but I think it's a good additional way to look at the data.
So, what's your theory as to why the alcohol consumption in some countries has been changing over time (at least, according to this data)? Do you have any special insight or inside-information you can share?