With many names, it's difficult to know whether the person is male or female. Let's use the power of analytics to determine which names are the most unisex, based on the number of boys and girls with those names.
But, before we get started, here's a picture of my friend Thelma & her cat Angel. Can you tell whether Angel is a boy or girl cat? - See, unisex names aren't just for humans anymore! :)
These days, we deal with many people virtually (through email and such), instead of in person. And it's often difficult to determine the gender of a person based on just their name (not that their gender should matter, of course ... but sometimes it's nice to know). I recently saw a very interesting analysis on flowingdata.com that showed The Most Unisex Names in US History, from 1930-2012. I thought it was an interesting analysis, and I wondered if I could do something similar using SAS.
I located the raw data for baby names, and was happy to see that two additional years were available since the flowingdata study (it now goes up to 2014). Based on some previous graphs I had created with the baby name data, I knew that naming trends had started to change around the 1970s, therefore I decided to focus my study on 1970 to present.
I used Proc Sql to join the male and female data by year and name, and then limited my study to just the names that occurred for both boys and girls, every year from 1970 to present. I calculated the deviation from 50% male 50% female for each year, and then calculated the average deviation for each name over all the years. After the limiting and ranking, I plotted the top 35 with the lowest average deviation.
Here's a screen-capture (click it to see the full-size graphs). Based on these graphs, can you answer the question posed in the title - is Jessie or Angel more of a unisex name?
Do you see any interesting trends/changes over time, and do you have any theories that might explain the changes?
6 Comments
Hi Robert, Is the sample code for this chart available? I didn't see it in your post. Also, i am looking for a similar solution using bar charts in addition to the line chart you have in your small multiples example.
thanks!
Here's a link to the SAS code! http://robslink.com/SAS/democd84/unisex_names_info.htm
Although my charts might look like a line/area chart, they are actually bar charts (with very tiny bars, packed very tightly together with no space between them!) :)
Growing up I had some friends (Brothers/sister) whose parents both was called Pat.
Ha! - What are the odds!?! ... Oh wait, with this audience, someone is likely to take me seriously, and calculate the actual odds! ;)
Very interesting analysis.
Some names have multiple spellings, e.g. jamey or jamie, nicky or nickey or nickie. How did you handle that?
I made it simple on myself, and just went by the spellings in the data :)