What is it that people like so much about motorcycles? The thrill/excitement/freedom of riding them, the 'biker image' portrayed in movies, or great songs such as Little Honda by the Hondells? I'm not a biker per say, but I do have a couple of motorcycles, and am a known associate of some bikers. Therefore when I saw a graph of some biker data, it caught my attention and I had to try my hand at improving it.
But first, here's a picture of one of my biker buddies here at SAS, Mark - he's got several bikes, and especially likes to ride on race tracks. The graphs below have a '>=50' age category, and you probably would have never guessed that Mark has been in that category for many years! :)
And now, let's proceed with the data analysis!
I recently saw a very interesting graph on dadaviz.com showing motorcycle deaths by age category. The feature about the graph that jumps out at you is that the percent of deaths for the <=29 age group seems to have gone down significantly since 1975. So, of course, I found the data, and created my own version of the graph using SAS:
The data also had the total number of deaths per age category, so I plotted that too. And from this plot, it looks like the number of deaths has generally been on the increase since year 2000.
But both of the above plots have a serious weakness - they don't take into consideration the number of riders (or bikes) in each age category. And that particular data (by age category) seems a bit harder to come by, so I decided to work with the data that is easily available, and see if I could shed some more light on these numbers.
First I added up the deaths from all age groups, and plotted the total deaths per year:
I then found a data source for the total number of motorcycles registered per year, and plotted that. Notice that the number of motorcycles has been climbing since year 2000:
And finally, I used Proc Sql to combine the data from the previous two charts, and calculated the number of deaths per 100,000 registered motorcycles.
proc sql noprint; create table combined as select unique deaths_by_age.year, deaths_by_age.total as total_deaths, registration_data.total as total_registrations from deaths_by_age left join registration_data on deaths_by_age.year=registration_data.year; quit; run; data combined; set combined; deaths_per_100k_bikes = total_deaths / (total_registrations/100000); run;
And this number has been generally going down over time, which is good news, and is a better/more-true representation of what's actually happening, IMHO.
Hopefully you enjoyed this little demonstration of the power of combining data!