The perfect storm for State Fair attendance!

3

The State Fair in North Carolina is just a few miles from SAS headquarters, and therefore it's virtually impossible for it to slip by without me noticing it. There are two aspects of the fair that usually get lots of news coverage - what's the latest fair-food, and did we set any attendance records? There's not a lot of data about the food available (although I did create this fun graph a few years ago), but thankfully attendance numbers are published on the NC State Fair website daily!

Before we get started, here's an awesome photo my friend John took at this year's State Fair (be sure and click to see the full-size image, so you can take in all the detail!)

fair_john

Now, let's analyze that attendance data ... Here's a screen-capture showing the top of the attendance table from the official webpage (click it to see the full table):

fair_table

I decided to start my analysis by making a similar table, but with a few enhancements. First, I got rid of the colors, used a '.' rather than 'n/a', and sorted the table with the most recent years at the top. I think all these things make it easier to read.

fair_table_rob1

Next, I used some tricky coding techniques to mark the highest attendance for each day in bright green. This involved using user-defined formats with color names, in combination with a somewhat obscure style option in Proc Print. I think this adds important information to the table, and makes it more of an analytic tool rather than just a table. Wow - looks like 2010 had several record-setting days!

fair_table_rob2

Now that we know which days had the record attendance, let's find out which year had the highest total. And what better way to show that than a simple bar chart! Looks like 2010 was the record setter.

nc_statefair_attendance2

Now, how about some more detailed plots that allow us to visualize all the individual values in the table? Here's a simple plot of the data by day, with the latest (2016) markers in bright red. Note that the other markers are transparent blue, so you can see where multiple markers 'stack up' on top of each other (multiple/overlapping markers become darker, as the transparent colors combine).

nc_statefair_attendance1

I like that plot, but I'm sure the analysts and statisticians are already salivating for a box plot. I know it's not traditional to show all the markers when using a box plot, but I like to be able to see the spread of the actual data, so I like to include them.

nc_statefair_attendance1

But even the box plot didn't seem to show all the secrets that I knew were hidden in this data. Therefore I created another plot showing each year of data as a separate line - and with this graph, you can see an oddity in the data for the second Thursday (some of the values were high, and some of them were much lower).

nc_statefair_attendance

And finally, I decided to color the lines by decade. It's not a beautiful graph (some would even disparagingly call it a spaghetti graph), but in this particular situation I think it provides some important insight that the other graphs did not!

With this graph, I was able to determine that the Thursdays with the lower attendance were from the 1980s and 1990s. And then I remembered that in more recent years there has been a big canned food drive on Thursdays where if you bring 5 cans of food to donate to the Food Bank of Central and Eastern North Carolina, you get into the fair for free. According to the fair website, since 1993, more than 4.4 million pounds of food have been donated by fairgoers. This transformed the traditionally lower-attendance Thursday into one of the higher-attendance days.

 

nc_statefair_attendance

Although some portions of NC are still recovering from the flooding caused by Hurricane Matthew, we had nice weather during the fair week this year. This probably helped produce the good attendance numbers. I wonder if it would be possible to correlate fair attendance to weather data? Hmm ... maybe a topic for a future bog!

Another factor which might have helped lure in people this year was the awesome new attraction called the Flyer Sky Ride. It's a chair lift that carries passengers above the fairgrounds, from one end to the other. Here's a link to a cool video my friend David made from this ride.

fair_video

I hope you had fun exploring & analyzing this data with me, and hopefully you have learned some tricks and techniques to use on your own data!

 

Share

About Author

Robert Allison

The Graph Guy!

Robert has worked at SAS for over a quarter century, and his specialty is customizing graphs and maps - adding those little extra touches that help them answer your questions at a glance. His educational background is in Computer Science, and he holds a BS, MS, and PhD from NC State University.

Related Posts

3 Comments

  1. Robert,
    It would be look better if you are using Medal Graph.


    data have;
    infile datalines expandtabs dlm=' ';
    input ( Year Thu Fri Sat Sun Mon Tue Wed Thu Fri Sat Sun Total) (: ?? comma32.);
    array x{*} Thu--Total;
    do i=1 to dim(x);
    x{i}=x{i}/1000;
    end;
    drop i;
    datalines;
    1986
    n/a
    52,175
    112,706
    80,606
    53,031
    49,670
    58,075
    53,576
    71,363
    70,293
    37,281
    638,776
    1987
    n/a
    64,877
    107,464
    86,873
    58,392
    67,388
    49,331
    48,230
    59,881
    119,531
    43,038
    705,005
    1988
    n/a
    46,249
    114,899
    93,392
    52,670
    49,437
    68,288
    56,214
    31,402
    113,854
    58,584
    684,989
    1989
    n/a
    50,330
    115,510
    86,635
    53,715
    38,119
    52,247
    40,096
    65,103
    98,122
    54,061
    653,938
    1990
    n/a
    60,074
    85,446
    71,633
    47,264
    51,361
    109,077
    33,997
    75,353
    114,977
    56,791
    705,973
    1991
    n/a
    53,869
    110,920
    67,441
    41,890
    51,594
    89,856
    55,666
    73,578
    112,582
    53,277
    710,673
    1992
    n/a
    46,646
    100,116
    70,281
    39,974
    47,703
    62,831
    54,709
    88,799
    118,006
    55,606
    684,671
    1993
    n/a
    73,448
    83,573
    58,207
    45,768
    48,377
    58,633
    45,781
    69,094
    119,448
    62,861
    665,190
    1994
    n/a
    15,546
    80,027
    89,212
    86,798
    60,788
    47,894
    34,876
    72,736
    131,604
    58,802
    678,283
    1995
    n/a
    42,479
    69,279
    89,237
    43,358
    56,257
    58,686
    93,179
    47,796
    130,092
    68,940
    699,303
    1996
    n/a
    34,455
    110,574
    89,309
    54,391
    63,995
    43,012
    86,285
    74,508
    131,287
    71,613
    759,429
    1997
    n/a
    44,106
    54,500
    28,736
    56,008
    65,099
    82,675
    57,399
    77,299
    136,939
    31,379
    634,140
    1998
    n/a
    57,948
    110,087
    94,660
    53,131
    61,238
    55,598
    79,440
    72,417
    122,276
    72,561
    779,356
    1999
    n/a
    49,812
    104,352
    19,762
    55,650
    57,779
    26,276
    89,598
    87,812
    136,832
    79,574
    707,447
    2000
    n/a
    53,331
    107,971
    94,959
    69,580
    60,275
    58,343
    96,904
    86,075
    137,513
    81,773
    846,724
    2001
    n/a
    47,940
    77,747
    46,540
    51,921
    54,116
    57,330
    84,975
    84,538
    113,450
    76,620
    695,177
    2002
    n/a
    54,036
    86,533
    80,685
    34,563
    53,226
    73,763
    67,634
    46,876
    118,905
    80,756
    696,977
    2003
    n/a
    61,364
    115,016
    97,655
    54,579
    65,923
    64,152
    96,614
    78,855
    129,589
    70,208
    833,955
    2004
    n/a
    61,289
    118,640
    95,043
    60,296
    61,288
    57,718
    91,556
    88,822
    119,461
    82,206
    836,319
    2005
    n/a
    52,201
    103,512
    90,153
    52,983
    59,945
    61,306
    92,233
    81,722
    111,634
    90,241
    795,930
    2006
    n/a
    52,527
    94,963
    91,768
    55,989
    30,840
    65,976
    103,323
    81,734
    145,461
    63,375
    785,956
    2007
    n/a
    57,798
    102,325
    93,473
    63,032
    69,817
    75,746
    88,801
    63,231
    145,955
    98,433
    858,611
    2008
    35,215
    42,666
    87,457
    85,495
    54,532
    71,199
    67,028
    80,094
    63,310
    76,296
    101,775
    765,067
    2009
    37,932
    60,369
    105,885
    70,294
    62,945
    71,537
    86,240
    108,929
    90,166
    79,272
    104,370
    877,939
    2010
    47,677
    77,485
    131,699
    112,130
    78,748
    81,553
    69,735
    125,573
    105,073
    151,647
    110,567
    1,091,887
    2011
    44,167
    72,550
    127,674
    107,112
    73,729
    74,961
    49,710
    112,822
    94,314
    146,635
    105,499
    1,009,173
    2012
    42,854
    67,508
    118,433
    100,744
    51,781
    75,564
    75,644
    101,272
    92,418
    139,484
    99,595
    965,297
    2013
    42,626
    72,794
    97,936
    105,310
    67,770
    68,092
    68,252
    102,176
    82,163
    122,223
    98,221
    927,563
    2014 46,478 73,571 116,267 94,691 62,499 66,758 63,616 93,808 87,827 126,629 97,604 929,748
    2015 50,327 90,954 126,666 97,906 63,989 69,687 71,348 104,887 95,685 140,886 107,397 1,019,732
    2016 40,449 78,364 119,117 109,429 75,243 75,995 72,654 104,852 85,473 150,747 116,041 1,028,364
    ;
    run;
    proc transpose data=have out=want;
    by year;
    run;
    ods graphics/width=2000 height=800;
    proc sgpanel data=want noautolegend nocycleattrs ;
    panelby _name_ / uniscale=row layout=columnlattice spacing=2 onepanel sort=data novarname;
    dot year / response=col1 group=_name_ nostatlabel
    markerattrs=(symbol=circlefilled size=10);
    rowaxis discreteorder=data display=(nolabel) fitpolicy=none;
    colaxis integer display=(nolabel);
    run;

Back to Top