By almost any measure, Rick Wicklin is the most prolific and most popular blogger at SAS. Author of The Do Loop, Rick has been writing and publishing blogs at SAS since 2010, and today he publishes his 1,000th post on The Do Loop!
To celebrate, I want to highlight some of the posts from the history of The Do Loop.
Let's start by looking at Rick's very first post. I think you'll appreciate the programmer introduction.
When programmers begin learning a new computer language, the first program they write is often one that prints the text “Hello, World!” Successfully writing a Hello World program assures the programmer that the software is successfully installed and that all necessary features are working: parsers, compilers, linkers, and so on. In fact, long before viral videos ruled the internet, many of us circulated a viral email that describes the evolution of a programmer in terms of Hello World programs.
Next let's look at the post that continues to be the most popular post on The Do Loop every year. Naturally, it concerns looping and how to loop in SAS.
Looping is essential to statistical programming. Whether you need to iterate over parameters in an algorithm or indices in an array, a loop is often one of the first programming constructs that a beginning programmer learns. Today is the first anniversary of this blog, which is named The DO Loop, so it seems appropriate to blog about DO loops in SAS. I'll describe looping in the SAS DATA step and compare it with looping in the SAS/IML language.
What about the most commented on post? You might want to set aside some time to read all 170+ comments on this one.
The log transformation is one of the most useful transformations in data analysis. It is used as a transformation to normality and as a variance stabilizing transformation. A log transformation is often used as part of exploratory data analysis in order to visualize (and later model) data that ranges over several orders of magnitude. Common examples include data on income, revenue, populations of cities, sizes of things, weights of things, and so forth. (Remember, however, that you do not have to transform variables! Some people mistakenly believe that linear regression requires normally distributed variables. It does not!)
My next few categories are getting a bit subjective, but I think this one is possibly the most popular nerdy math post, and it tells a great story with great visuals.
"Daddy, help! Help me! Come quick!" I heard my daughter's screams from the upstairs bathroom and bounded up the stairs two at a time. Was she hurt? Bleeding? Was the toilet overflowing? When I arrived in the doorway, she pointed at the wall and at the floor. The wall was splattered with black nail polish. On the floor laid a broken bottle in an expanding pool of black ooze. "It slipped," she sobbed.
The best guest post on The Do Loop was written by Rick's son for a science project about dryer balls.
Hi! My name is David Wicklin. This blog post describes a research project and statistical graphs that I created for the 2013 ASA Poster Competition. My poster began one day when my sister saw a small box on a store shelf that contained two spiked plastic balls. The balls were about the size of tennis balls. The box said that you should put these "dryer balls" in your dryer with a load of wet laundry. The box claimed that the balls would "reduce drying time by up to 25%." I was skeptical. Could putting plastic balls in the dryer really reduce drying time?
What about the post with the most media attention? Of course, it involves M&Ms, but did you know it was written up in a Quartz magazine article, "A statistician got curious about M&M colors and went on an endearingly geeky quest for answers."
Many introductory courses in probability and statistics encourage students to collect and analyze real data. A popular experiment in categorical data analysis is to give students a bag of M&M® candies and ask them to estimate the proportion of colors in the population from the sample data. In some classes, the students are also asked to perform a chi-square analysis to test whether the colors are uniformly distributed or whether the colors match a hypothetical set of proportions.
Next is Rick's post with the most Wikipedia traffic, which gets a lot of visitors from the Wikipedia entry on the same topic.
I previously described how to use Mahalanobis distance to find outliers in multivariate data. This article takes a closer look at Mahalanobis distance. A subsequent article will describe how you can compute Mahalanobis distance.
And finally, the post with the best headline is debatable, but I like this one.
The skewness of a distribution indicates whether a distribution is symmetric or not. A distribution that is symmetric about its mean has zero skewness. In contrast, if the right tail of a unimodal distribution has more mass than the left tail, then the distribution is said to be "right skewed" or to have positive skewness. Similarly, negatively skewed distributions have long (or heavy) left tails.
What are your favorite posts from The Do Loop? Help Rick celebrate by commenting here or rushing over to his 1,000th post and telling him there how much you appreciate his content.