Tag: Data Analysis

Rick Wicklin 0
Modeling Finite Mixtures with the FMM Procedure

In my previous post, I blogged about how to sample from a finite mixture distribution. I showed how to simulate variables from populations that are composed of two or more subpopulations. Modeling a response variable as a mixture distribution is an active area of statistics, as judged by many talks

Rick Wicklin 0
The effect of holidays on US births

Last week I showed a graph of the number of US births for each day in 2002, which shows a strong day-of-the-week effect. The graph also shows that the number of births on a given day is affected by US holidays. This blog post looks closer at the holiday effect.

Rick Wicklin 0
Visualizing Scrabble games

My elderly mother enjoys playing Scrabble®. The only problem is that my father and most of my siblings won't play with her because she beats them all the time! Consequently, my mother is always excited when I visit because I'll play a few Scrabble games with her. During a recent

Rick Wicklin 0
Visualizing correlations between variables in SAS

Exploring correlation between variables is an important part of exploratory data analysis. Before you start to model data, it is a good idea to visualize how variables related to one another. Zach Mayer, on his Modern Toolmaking blog, posted code that shows how to display and visualize correlations in R.

Rick Wicklin 0
Complex assignment statements: CHOOSE wisely

This article describes the SAS/IML CHOOSE function: how it works, how it doesn't work, and how to use it to make your SAS/IML programs more compact. In particular, the CHOOSE function has a potential "gotcha!" that you need to understand if you want your program to perform as expected. What

Rick Wicklin 0
Improving graphs of highly correlated data

If you create a scatter plot of highly correlated data, you will see little more than a thin cloud of points. Small-scale relationships in the data might be masked by the correlation. For example, Luke Miller recently posted a scatter plot that compares the body temperature of snails when they

Rick Wicklin 0
Overlaying two histograms in SAS

A reader commented to me that he wants to use the HISTOGRAM statement of the SGPLOT procedure to overlay two histograms on a single plot. He could do it, but unfortunately SAS was choosing a large bin width for one of the variables and a small bin width for the

Rick Wicklin 0
A statistical analysis of coupons

Each Sunday, my local paper has a starburst image on the front page that proclaims "Up to $169 in Coupons!" (The value changes from week to week.) One day I looked at the image and thought, "Does the paper hire someone to count the coupons? Is this claim a good

Rick Wicklin 0
Finding data that satisfy a criterion

A fundamental operation in data analysis is finding data that satisfy some criterion. How many people are older than 85? What are the phone numbers of the voters who are registered Democrats? These questions are examples of locating data with certain properties or characteristics. The SAS DATA step has a

Rick Wicklin 0
Calling R from SAS/IML software

For years I've been making presentations about SAS/IML software at conferences. Since 2008, I've always mentioned to SAS customers that they can call R from within SAS/IML software. (This feature was introduced in SAS/IML Studio 3.2 and was added to the IML procedure in SAS/IML 9.22.) I also included a

Rick Wicklin 0
The COALESCE function: PROC SQL compared with PROC IML

When Charlie H. posted an interesting article titled "Top 10 most powerful functions for PROC SQL," there was one item on his list that was unfamiliar: the COALESCE function. (Edit: Charlie's blog no longer exists. The article used to be available at http://www.sasanalysis.com/2011/01/top-10-most-powerful-functions-for-proc.html) Ever since I posted my first response,

Rick Wicklin 0
Where do major airlines fly?

Last week the Flowing Data blog published an excellent visualization of the flight patterns of major US airlines. On Friday, I sent the link to Robert Allison, my partner in the 2009 ASA Data Expo, which explored airline data. Robert had written a SAS program for the Expo that plots

Rick Wicklin 0
Funnel plots: An alternative to ranking

In a previous blog post, I showed how you can use simulation to construct confidence intervals for ranks. This idea (from a paper by E. Marshall and D. Spiegelhalter), enables you to display a graph that compares the performance of several institutions, where "institutions" can mean schools, companies, airlines, or

Rick Wicklin 0
The sound of the Dow...in SAS

At the beginning of 2011, I heard about the Dow Piano, which was created by CNNMoney.com. The Dow Piano visualizes the performance of the Dow Jones industrial average in 2010 with a line plot, but also adds an auditory component. As Bård Edlund, Art Director at CNNMoney.com, said, The daily

Rick Wicklin 0
Computing the variance of each column of a matrix

In a previous blog post about computing confidence intervals for rankings, I inadvertently used the VAR function in SAS/IML 9.22, without providing equivalent functionality for those readers who are running an earlier version of SAS/IML software. (Thanks to Eric for pointing this out.) If you are using a version of

Rick Wicklin 0
How to rank values

When comparing scores from different subjects, it is often useful to rank the subjects. A rank is the order of a subject when the associated score is listed in ascending order. I've written a few articles about the importance of including confidence intervals when you display rankings, but I haven't

1 13 14 15 16