Recently the "SAS Sample of the Day" was a Knowledge Base article with an impressively long title: Sample 42165: Using a stored process to eliminate duplicate values caused by multiple group memberships when creating a group-based, identity-driven filter in SAS® Information Map Studio "Wow," I thought. "This is the longest
Tag: Data Analysis
Halloween night was rainy, so many fewer kids knocked on the door than usual. Consequently, I'm left with a big bucket of undistributed candy. One evening as I treated myself to a mouthful of tooth decay, I chanced to open a package of Wonka® Bottle Caps. The package contained three
Being able to reshape data is a useful skill in data analysis. Most of the time you can use the TRANSPOSE procedure or the SAS DATA step to reshape your data. But the SAS/IML language can be handy, too. I only use PROC TRANSPOSE a few times per year, so
"I think that my data are exponentially distributed, but how can I check?" I get asked that question a lot. Well, not specifically that question. Sometimes the question is about the normal, lognormal, or gamma distribution. A related question is "Which distribution does my data have," which was recently discussed
When I learn a new statistical technique, one of first things I do is to understand the limitations of the technique. This blog post shares some thoughts on modeling finite mixture models with the FMM procedure. What is a reasonable task for FMM? When are you asking too much? I
A popular use of SAS/IML software is to optimize functions of several variables. One statistical application of optimization is estimating parameters that optimize the maximum likelihood function. This post gives a simple example for maximum likelihood estimation (MLE): fitting a parametric density estimate to data. Which density curve fits the
When you misspell a word on your mobile device or in a word-processing program, the software might "autocorrect" your mistake. This can lead to some funny mistakes, such as the following: I hate Twitter's autocorrect, although changing "extreme couponing" to "extreme coupling" did make THAT tweet more interesting. [@AnnMariaStat] When
In my previous post, I blogged about how to sample from a finite mixture distribution. I showed how to simulate variables from populations that are composed of two or more subpopulations. Modeling a response variable as a mixture distribution is an active area of statistics, as judged by many talks
Last week I showed a graph of the number of US births for each day in 2002, which shows a strong day-of-the-week effect. The graph also shows that the number of births on a given day is affected by US holidays. This blog post looks closer at the holiday effect.
My friend Chris posted an analysis of the distribution of birthdays for 236 of his Facebook friends. He noted that more of his friends have birthdays in April than in September. The numbers were 28 for April, but only 25 for September. As I reported in my post on "the