The State Fair in North Carolina is just a few miles from SAS headquarters, and therefore it's virtually impossible for it to slip by without me noticing it. There are two aspects of the fair that usually get lots of news coverage - what's the latest fair-food, and did we set any attendance records? There's not a lot of data about the food available (although I did create this fun graph a few years ago), but thankfully attendance numbers are published on the NC State Fair website daily!
Before we get started, here's an awesome photo my friend John took at this year's State Fair (be sure and click to see the full-size image, so you can take in all the detail!)
Now, let's analyze that attendance data ... Here's a screen-capture showing the top of the attendance table from the official webpage (click it to see the full table):
I decided to start my analysis by making a similar table, but with a few enhancements. First, I got rid of the colors, used a '.' rather than 'n/a', and sorted the table with the most recent years at the top. I think all these things make it easier to read.
Next, I used some tricky coding techniques to mark the highest attendance for each day in bright green. This involved using user-defined formats with color names, in combination with a somewhat obscure style option in Proc Print. I think this adds important information to the table, and makes it more of an analytic tool rather than just a table. Wow - looks like 2010 had several record-setting days!
Now that we know which days had the record attendance, let's find out which year had the highest total. And what better way to show that than a simple bar chart! Looks like 2010 was the record setter.
Now, how about some more detailed plots that allow us to visualize all the individual values in the table? Here's a simple plot of the data by day, with the latest (2016) markers in bright red. Note that the other markers are transparent blue, so you can see where multiple markers 'stack up' on top of each other (multiple/overlapping markers become darker, as the transparent colors combine).
I like that plot, but I'm sure the analysts and statisticians are already salivating for a box plot. I know it's not traditional to show all the markers when using a box plot, but I like to be able to see the spread of the actual data, so I like to include them.
But even the box plot didn't seem to show all the secrets that I knew were hidden in this data. Therefore I created another plot showing each year of data as a separate line - and with this graph, you can see an oddity in the data for the second Thursday (some of the values were high, and some of them were much lower).
And finally, I decided to color the lines by decade. It's not a beautiful graph (some would even disparagingly call it a spaghetti graph), but in this particular situation I think it provides some important insight that the other graphs did not!
With this graph, I was able to determine that the Thursdays with the lower attendance were from the 1980s and 1990s. And then I remembered that in more recent years there has been a big canned food drive on Thursdays where if you bring 5 cans of food to donate to the Food Bank of Central and Eastern North Carolina, you get into the fair for free. According to the fair website, since 1993, more than 4.4 million pounds of food have been donated by fairgoers. This transformed the traditionally lower-attendance Thursday into one of the higher-attendance days.
Although some portions of NC are still recovering from the flooding caused by Hurricane Matthew, we had nice weather during the fair week this year. This probably helped produce the good attendance numbers. I wonder if it would be possible to correlate fair attendance to weather data? Hmm ... maybe a topic for a future bog!
Another factor which might have helped lure in people this year was the awesome new attraction called the Flyer Sky Ride. It's a chair lift that carries passengers above the fairgrounds, from one end to the other. Here's a link to a cool video my friend David made from this ride.
I hope you had fun exploring & analyzing this data with me, and hopefully you have learned some tricks and techniques to use on your own data!