My colleague Robert Allison finds the most interesting data sets to visualize! Yesterday he posted a visualization of toothless seniors in the US. More precisely, he created graphs that show the estimated prevalence of adults (65 years or older) who have had all their natural teeth extracted. The dental profession
Tag: Data Analysis
My son is in high school and plans to take the ACT, a standardized test to assess college aptitude and readiness. My wife asked, "What is a good score for the ACT?" I didn't know, but I did a quick internet search and discovered a tabulation of scores for the
Recently a SAS customer asked how to Winsorize data in SAS. Winsorization is best known as a way to construct robust univariate statistics. The Winsorized mean is a robust estimate of location. The Winsorized mean is similar to the trimmed mean, and both are described in the documentation for PROC
When you count the outcomes of an experiment, you do not always observe all of the possible outcomes. For example, if you roll a six-sided die 10 times, it might be that the "1" face does not appear in those 10 rolls. Obviously, this situation occurs more frequently with small
"Daddy, help! Help me! Come quick!" I heard my daughter's screams from the upstairs bathroom and bounded up the stairs two at a time. Was she hurt? Bleeding? Was the toilet overflowing? When I arrived in the doorway, she pointed at the wall and at the floor. The wall was
I was reading a statistics book when I encountered a histogram that caught my eye. The histogram looked similar to the one at the left. It contained a normal density estimate overlaid on a histogram, but the height of the density curve seemed too short when compared to the heights
A SAS programmer asked for a list of SAS/IML functions that operate on the columns of an n x p matrix and return a 1 x p row vector of results. The functions that behave this way tend to compute univariate descriptive statistics such as the mean, median, standard deviation, and quantiles. The following
Perhaps you saw the headlines earlier this week about the fact that it has been nine years since the last major hurricane (category 3, 4, or 5) hit the US coast. According to a post on the GeoSpace blog, which is published by the American Geophysical Union (AGU), researchers ran
Suppose that you compute the correlation matrix (call it R1) for a set of variables x1, x2, ..., x8. For some reason, you later want to compute the correlation matrix for the variables in a different order, maybe x2, x1, x7,..., x6. Do you need to go back to the
Imagine that you have one million rows of numerical data and you want to determine if a particular "target" value occurs. How might you find where the value occurs? For univariate data, this is an easy problem. In the SAS DATA step you can use a WHERE clause or a