Earlier today, Rick posted interesting information about which time of year the most babies are born, at least in the USA.
I don't have data nearly as extensive as what's available at the NCHS, but I do have a sample of birthday records to compare against Rick's findings. My sample comes from my Facebook account, where many of my friends share the information about their birthdays so that at least once a year, they receive friendly cybergreetings from all the people that they've connected with.
As you might know, I wrote an application that can turn my Facebook friend data into SAS data sets. At the moment, I'm connected to 355 people on Facebook (that number drops by the minute as my friends read this blog post). Of the 355 friends, only 236 share their birthday information (month and day). Even fewer (only 106) share their birth year as part of that. The policies for sharing this information are among the privacy settings that you can tweak on your Facebook account.
As part of the results of the SAS program that I've generated, I've included a report of "Known birthdays in calendar order". By using these data records, I can check the small sample of my friends' birthdays against the slightly larger set of the entire country.
I've included an "anonymized" table of just the September birthdays here. Of the 236 birthday records that I have, 19 of them fall in September.
For my data, September is not the "birthiest" month; April is. I can see that with this simple SGPLOT output of birthday frequency by month.
But does this mean that April (with 28 birthdays) was truly the busiest period for birthdays among my friends? Month boundaries are a bit arbitrary ("30 days has September, April, June..."). In Rick's post, he pointed out that weeks 37, 38, and 39 combined represent the peak number of births. What if I generate a report of frequency of births per week number? (I used the WEEK function to calculate the week number from each date.) According to the chart below, week 14 (which creeps into April) has the highest number, with 10 birthdays.
Using a quick-and-dirty DATA step and the LAG function, I built another column that represents "3-week running total", just to see where the highest 3-week period falls. In my case, the highest-volume 3-week period appears to end with week 40. That actually falls pretty close to the peak time of year reflected in the national statistics.
What insight have I gained about my friends here? None, really. That's because when it comes to the information that they share on Facebook, many of my friends are less than honest...as illustrated in the following story.
Earlier this year I noticed (on Facebook) that my friend John had a birthday coming up, so I wrote a "Happy Birthday" message on his Facebook wall. (I probably said something really clever...who knows?) When I next saw him in person (I do occasionally interact with people that way), John said to me: "Chris, January 1 is my cyber birthday, the birthday I use online to separate those who know me well from the posers." Well, I guess we know where I fall among those groups.
I always presumed the formula was:
September_blip = Event(NewYearsEve)+mdy(9.0.0);
April_blip = Event(SummerParties)+mdy(9,0,0);
It would be interesting to compare your "Northern Hemisphere" April blip with a "Southern Hemisphere" equivalent - it may just exacerbate our September result. (Although September are our football finals...plus9 months... so does that mean NFL finals plus 9 months, etc, etc...)
Not sure how to code for fake dates in Facebook..
Pingback: The birthday controversy: Are more people born in April or September? - The DO Loop
Nice post. As always, I am impressed by your ability to gather data from all sorts of sources. Your analysis inspired me to blog a little more on this topic: http://blogs.sas.com/content/iml/2011/09/13/the-birthday-controversy-are-more-people-born-in-april-or-september/
Pingback: What’s our sign? - The SAS Dummy