Using data to define hurricane season


In a previous blog post about hurricanes, I created a histogram of the occurrence of tropical cyclones in the Atlantic basin during the years 1988–2003. That histogram shows that the peak of hurricane activity occurs in the second week of September, but also that a majority of tropical storms occur between 01JUN and 30NOV of each year, a time period that is called "hurricane season."

Why are these specific dates used? The National Hurricane Center (NHC) states that "these dates were selected to encompass over 97% of tropical activity." Interesting! In statistical terms, these dates correspond to quantiles of the distribution of hurricane occurrence!

My previous post did not describe how I transformed the DATE variable in the Hurricanes data set into a day-of-the-year variable that is suitable for plotting with a histogram. Easy. I simply applied a SAS format! Specifically, I used the JULDAY format, which returns the ordinal date (also called the Julian date) for each Gregorian date.

For example, 01FEB (for any year) corresponds an ordinal date of 32, since it is the thirty-second day of the year.

The following statements read all the tropical storms and hurricanes from the Hurricanes data set, and create the DAYOFYEAR variable by using the PUTN function to apply the JULDAY format. (The data are distributed with SAS/IML Studio, but you can also download the data from the right-hand links of the SAS/IML Studio product page.) Because a character string results from applying the format, it is necessary to apply the NUM function to convert the string into a number.

proc iml;
libname iml "C:\Program Files\SAS\SASIMLStudio\3.2\Data Sets";
use iml.Hurricanes(where=(Category^=" " & Category ^= "TD"));
read all var {Date};
close iml.Hurricanes;
DayOfYear = num(putn(Date, "julday8."));

The DAYOFYEAR variable contains values 1–366. I can use the QNTL subroutine to find dates q1 and q2 such that "over 97% of tropical activity" occurs between q1 and q2. I’m don't know which number "over 97%" the NHC uses, but I’ll compute q1 and q2 for 97.5% of tropical activity:

call qntl(q, DayOfYear, {0.0125 0.9875});
print q[format=Date5.];

For the 1988–2003 storms, 97.5% of the tropical activity occurs between 06JUN and 01DEC, which is amazingly close to the dates that the NHC uses to define "hurricane season."

That’s pretty cool! "Hurricane season" is not only a meteorological phenomonon, but a statistical one as well.


About Author

Rick Wicklin

Distinguished Researcher in Computational Statistics

Rick Wicklin, PhD, is a distinguished researcher in computational statistics at SAS and is a principal developer of SAS/IML software. His areas of expertise include computational statistics, simulation, statistical graphics, and modern methods in statistical data analysis. Rick is author of the books Statistical Programming with SAS/IML Software and Simulating Data with SAS.

1 Comment

  1. I never thought abou this before. I wonder why they didn't do 95% quantiles? Maybe the catestrophic nature of a hurricane causes them to use a wider interval.

Leave A Reply

Back to Top