My colleague Robert Allison finds the most interesting data sets to visualize! Yesterday he posted a visualization of toothless seniors in the US. More precisely, he created graphs that show the estimated prevalence of adults (65 years or older) who have had all their natural teeth extracted. The dental profession calls these people edentulous. According to the CDC, about 20% of seniors (more than 35 million Americans) are edentulous.
When I looked at his sorted bar chart, I noticed that the states that had the most toothless seniors seemed to be poorer states such as West Virginia, Kentucky, Tennessee, Mississippi, and Louisiana. In contrast, richer states such as Colorado, Connecticut, and Hawaii had a relatively small number of toothless seniors. I wondered whether there was a correlation between median income in a state and the number of edentulous individuals.
Rob always publishes his SAS programs, so I used his data and merged it with the state median household income (2-year-average medians) as reported by the US Census Bureau. Then I used the SGPLOT procedure to plot the incidence of toothlessness (with lower and upper confidence intervals) versus the median state incomes. I also added a loess curve as a visual smoother to the data, as follows:
title "All Teeth Extracted vs. Median Income"; proc sgplot data=all; scatter x=income y=pct / yerrorlower=LCL yerrorupper=UCL datalabel=N; loess x=income y=pct; run;
The resulting graph (click to enlarge) shows a strong linear correlation (ρ = -0.63) between the data, and the loess curve indicates that the relationship is stronger for states in which the median income is less than $50,000. The confidence intervals indicate that most of the data is well approximated by the loess fit, but there are a few outliers.
Two states in the upper left corner of the graph (West Virginia and Kentucky) have incidences of edentulous that are much higher than suggested by the model that uses only median household income. Several states—including Montana, Florida, and Hawaii—have a much lower incidence of tooth extraction. For easy identification of the states on the scatter plot, you can create a second scatter plot that does not contain the confidence limits and instead displays the state names as labels.
Like Rob, I always post the full SAS code that creates the analyses and graphs in my blog posts, so feel free to play with the data and create more visualizations.
And regardless of your income or state of residence, brush your teeth!