Here in the United States, we have our general election (where we elect the president) every four years - and 2020 happens to be one of those election years. This time we seem to have a lot more people voting early. I can't tell you the reason they're voting early (perhaps because of covid-19? perhaps because of this year's political climate?) ... but I can show you a graph of the numbers!
I downloaded the raw data (showing each individual voter who voted early in NC), extracted the zip file, imported the csv into SAS, used SQL to summarize the data, and then a data step to calculate the cumulative totals. I then plotted both the 2016 and 2020 cumulative totals on the same graph, with the x-axis showing the number of days before the election (aligning the two lines on the day early in-person voting started).
And here's the graph:
And the big question - Will the 2020 numbers stay this far ahead of the 2016 numbers? Or will the two curves eventually converge? Feel free to leave a comment with your thoughts on that! (be sure to keep the comments civil, with no "partisan politics"!) 🙂
For those of you who are data-meisters (and in particular SAS programmers), below are some technical details about the code I used to create the graph. And here's a link to the SAS code, and the HTML version of the output (with the live links to the data in the footnotes).
- I got the data from the NC State Board of Elections - here are links to the 2016 and 2020 data.
- Once I downloaded the zip, and extracted the csv, I used Proc Import with dbms=csv to read the data into SAS.
- I used Proc SQL's count() function to calculate the number of votes for each day, and a data step to calculate the cumulative number of votes.
- Proc SGplot allowed me to overlay the two lines.
- I used annotated text for the footnotes(so I could embed links to the data sources in the HTML output.
- I used series plots for the lines, and the 'markers' option to add markers at each data point along the line. I think the markers give a good visual cue as to how fast the data is increasing (the farther apart, the faster the increase).
- Rather than using a legend, I use the 'curvelabel' option to label each line with the year.
- And I use the 'reverse' option on the xaxis, so the minimum "days before election" is on the right side of the graph.
Here's a graph, labeled to show where some of the things I mentioned above appear on the graph: