Many cities have Open Data pages. But once you download the data, what can you do with it? This is my second blog post where I download several datasets from Cary, NC's open data page, and and give you a few ideas to get you started on your own data exploration!
And what data did I choose this time? Here's a picture from my friend Beth, to give you a hint. That would be a pretty nice fire to sit around on a crisp fall/winter evening, eh?!?
If you guessed "fire incident data" you are correct! Not a nice controlled fire like the one in the picture above - but one you have to call the fire department to come extinguish ...
I downloaded the csv version of the Cary fire incident data from this page, and then used Proc Import to read it into SAS. After running Proc Import, I saved the generated code to speed up future runs (the first time you run the proc, it has to analyze each field of each line in the file, to determine what length and type to make each variable). See link to my SAS code, at the bottom of the blog.
Fire departments respond to several different kinds of emergencies - not just fires. For this particular analysis I only wanted to see the data specifically related to fires, therefore I subset the data to only include incidents where majorcategory='FIRE'. Here's a glimpse of what the data looks like in my SAS table:
In addition to the fire incident data, I knew that I also wanted to include the locations of the fire stations in some of my analyses. Therefore I found a list of Cary's fire stations, copy-n-pasted their street addresses into a data step in my SAS code, and then used Proc Geocode to estimate the latitude/longitude coordinates for those addresses.
proc geocode data=station_data out=station_data (rename=(x=station_long y=station_lat))
Analysis based on geography
The first question that came to mind about this data is "where did the fires occur?" The incident data contains lat/long coordinates of each fire, therefore I didn't have to run Proc Geocode on the street address. I plotted the fire as red dots, and the fire stations as larger yellow dots. I chose a dark/grayscale background map, so that my marker colors would not be confused with any of the other map details (like it might be in a typical color OpenStreetmap).
proc sgmap plotdata=all_data noautolegend;
scatter x=incident_long y=incident_lat / markerattrs=(symbol=circlefilled size=3pt color="red");
scatter x=station_long y=station_lat / markerattrs=(symbol=circlefilled size=8pt color="yellow");
The above map certainly helped make the data more approachable. But it also made me curious - which station responded to each fire? Is there a way to represent that on the map? I tweaked my code a little, and added a line from each fire incident to the station that responded. I also used a different color for each station's fires, to make it easier to visually group them. Pretty neat, eh - almost artistic!
Analysis based on month
The next question that came to mind was whether or not there are more fires during certain times of the year? Therefore I created a special bar chart where the height of the bar represents how many fires that happened each month. I also used colors so you could see the types of fires.
A normal bar chart groups all of each color together into a bar segment, but I wanted each fire to be represented by a separate bar segment (square). Therefore I'm actually using a SGplot 'heatmap' rather than a bar chart. It took a little extra work (calculating an x/y coordinate for each square), but I think it was worth it! Looks like the month with the most fires was July 2019, and the month with the fewest was August 2020 (note that the last month in the graph is incomplete, because the data only goes through the 20th of that month).
Analysis based on time
In addition to the geographical, and month-of-year analyses (above), I thought it might be interesting to see if there were any trends by time-of-day. Therefore I created a histogram, showing the number of incidents by hour. I rounded each time to the nearest hour, and the height of the bar represents the number of fire incidents that happened during that hour.
Do you see any trends in the data? Were they trends you expected to see, or did any of them surprise you? What other analyses might provide more insight into this data? Here's a link to my SAS code, in case you would like to experiment with it, and maybe modify it to plot your city's open data.
Also, here's a link to the interactive version of the charts, with mouse-over text for all the markers on the maps, etc.