Tag: Data Analysis

Rick Wicklin 0
Wealth and winning in NC high school athletics

The Raleigh News & Observer published a front-page article about the effect of wealth and poverty on high school athletics in North Carolina. In particular, the article concluded that "high schools with a high percentage of poor students rarely win titles in the so-called country club sports—tennis, golf and swimming—and

Rick Wicklin 0
The relationship between toothlessness and income

My colleague Robert Allison finds the most interesting data sets to visualize! Yesterday he posted a visualization of toothless seniors in the US. More precisely, he created graphs that show the estimated prevalence of adults (65 years or older) who have had all their natural teeth extracted. The dental profession

Rick Wicklin 0
How to Winsorize data in SAS

Recently a SAS customer asked how to Winsorize data in SAS. Winsorization is best known as a way to construct robust univariate statistics. The Winsorized mean is a robust estimate of location. The Winsorized mean is similar to the trimmed mean, and both are described in the documentation for PROC

Rick Wicklin 0
The spiral of splatter

"Daddy, help! Help me! Come quick!" I heard my daughter's screams from the upstairs bathroom and bounded up the stairs two at a time. Was she hurt? Bleeding? Was the toilet overflowing? When I arrived in the doorway, she pointed at the wall and at the floor. The wall was

Rick Wicklin 0
Popular posts from The DO Loop in 2014

I published 118 blog posts in 2014. This article presents my most popular posts from 2014 and late 2013. 2014 will always be a special year for me because it was the year that the SAS University Edition was launched. The University Edition means that SAS/IML is available to all

Rick Wicklin 0
Resampling and permutation tests in SAS

My colleagues at the SAS & R blog recently posted an example of how to program a permutation test in SAS and R. Their SAS implementation used Base SAS and was "relatively cumbersome" (their words) when compared with the R code. In today's post I implement the permutation test in

Rick Wicklin 0
Does this kurtosis make my tail look fat?

What is kurtosis? What does negative or positive kurtosis mean, and why should you care? How do you compute kurtosis in SAS software? It is not clear from the definition of kurtosis what (if anything) kurtosis tells us about the shape of a distribution, or why kurtosis is relevant to

Rick Wicklin 0
Create discrete heat maps in SAS/IML

In a previous article I introduced the HEATMAPCONT subroutine in SAS/IML 13.1, which makes it easy to visualize matrices by using heat maps with continuous color ramps. This article introduces a companion subroutine. The HEATMAPDISC subroutine, which also requires SAS/IML 13.1, is designed to visualize matrices that have a small

Rick Wicklin 0
The frequency of bigrams in an English corpus

In last week's article about the distribution of letters in an English corpus, I presented research results by Peter Norvig who used Google's digitized library and tabulated the frequency of each letter. Norvig also tabulated the frequency of bigrams, which are pairs of letters that appear consecutively within a word.

Rick Wicklin 0
Designing a quantile bin plot

While at JSM 2014 in Boston, a statistician asked me whether it was possible to create a "customized bin plot" in SAS. When I asked for more information, she told me that she has a large data set. She wants to visualize the data, but a scatter plot is not

Rick Wicklin 0
Skew this

The skewness of a distribution indicates whether a distribution is symmetric or not. A distribution that is symmetric about its mean has zero skewness. In contrast, if the right tail of a unimodal distribution has more mass than the left tail, then the distribution is said to be "right skewed"

Rick Wicklin 0
The frequency of letters in an English corpus

It's time for another blog post about ciphers. As I indicated in my previous blog post about substitution ciphers, the classical substitution cipher is no longer used to encrypt ultra-secret messages because the enciphered text is prone to a type of statistical attack known as frequency analysis. At the root

Rick Wicklin 0
How to create a hexagonal bin plot in SAS

While I was working on my recent blog post about two-dimensional binning, a colleague asked whether I would be discussing "the new hexagonal binning method that was added to the SURVEYREG procedure in SAS/STAT 13.2." I was intrigued: I was not aware that hexagonal binning had been added to a

Learn SAS
Rick Wicklin 0
Choosing bins for histograms in SAS

When you create a histogram with statistical software, the software uses the data (including the sample size) to automatically choose the width and location of the histogram bins. The resulting histogram is an attempt to balance statistical considerations, such as estimating the underlying density, and "human considerations," such as choosing

1 9 10 11 12 13 16